Open VMS Performance Tips - PowerPoint PPT Presentation

1 / 71
About This Presentation
Title:

Open VMS Performance Tips

Description:

Decompress 2.74 gZIP archive. Default O/S & RMS settings. Test 2. Compress 5.67 GB saveset ... Decompress 2.74 gZIP archive. SET RMS/BLOCK=127/EXTEN=60000 ... – PowerPoint PPT presentation

Number of Views:124
Avg rating:3.0/5.0
Slides: 72
Provided by: pel53
Learn more at: http://de.openvms.org
Category:

less

Transcript and Presenter's Notes

Title: Open VMS Performance Tips


1
Open VMS Performance Tips Tricks
Guy Peleg President Maklee Engineering
guy.peleg_at_maklee.com
2
Performance Why should you care?
Application Tuning
Oracle Tuning
System Tuning
Java Tuning
3
The Golden Rules
Source OpenVMS Information Desk October 2004
  • The best performing code isthe code not being
    executed
  • The fastest I/Os are those avoided
  • Idle CPUs are the fastest CPUs
  • Look at your code.be ready to be surprised

4
RMS
  • RMS holds great potential for improving
    performance
  • The C RTL uses RMS
  • Most C applications would benefit from RMS tuning

5
RMS
  • RMS parameters related to performance
  • FAB/RAB parameters (should you have access to the
    code)
  • ASY, RAH, WBH, DFW, SQO
  • ALQ DEQ
  • MBC MBF
  • NOSHR, NQL, NLK
  • SET RMS
  • /SYSTEM /PROCESS
  • /BUFFER_COUNTn
  • /BLOCK_COUNTn
  • SYSGENgt SET RMS_SEQFILE_WBH 1
  • Dont be afraid of Global Buffers

6
FTP Performance Simple RMS Tuning
  • FTP into IT13 and transfer the file
  • Brutelgt ftp it13
  • 220 IT13.bruclass.com FTP Server (Version 5.6)
    Ready.
  • Connected to ALPH13.BRUCLASS.COM.
  • Name (ALPH13.BRUCLASS.COMbru_guy) peleg
  • 331 Username peleg requires a Password
  • Password
  • 230 User logged in.
  • FTPgt cd 1dga703000000
  • 250-CWD command successful.
  • 250 New default directory is 1DGA703000000
  • FTPgt put HP-I64VMS-JAVA150-V0105-1-1.PCSI_SFX_I64E
    XE
  • 200 TYPE set to IMAGE.
  • 200 PORT command successful.
  • 150 Opening data connection for
    1DGA703000000HP-I64VMS-JAVA150-V0105-1-1.PC
  • SI_SFX_I64EXE (192.168.1.7,49428)
  • 226 Transfer complete.
  • local SYSSYSDEVICEBRU_GUYHP-I64VMS-JAVA150-V0
    105-1-1.PCSI_SFX_I64EXE1 rem
  • ote HP-I64VMS-JAVA150-V0105-1-1.PCSI_SFX_I64EXE

7
FTP Performance Simple RMS Tuning
  • set rms/sys/exte60000/seq/block127/buf8
  • mc sysgen
  • SYSGENgt SET RMS_SEQ 1
  • SYSGENgt W A
  • SYSGENgt Exit
  • Throughput increased by more than 50
  • FTPgt put HP-I64VMS-JAVA150-V0105-1-1.PCSI_SFX_I64E
    XE
  • 200 TYPE set to IMAGE.
  • 200 PORT command successful.
  • 150 Opening data connection for
    1DGA703000000HP-I64VMS-JAVA150-V0105-1-1.PC
  • SI_SFX_I64EXE (192.168.1.7,49432)
  • 226 Transfer complete.
  • local SYSSYSDEVICEBRU_GUYHP-I64VMS-JAVA150-V0
    105-1-1.PCSI_SFX_I64EXE1 rem
  • ote HP-I64VMS-JAVA150-V0105-1-1.PCSI_SFX_I64EXE
  • 286026004 bytes sent in 000031.83 seconds
    (8773.78 Kbytes/s)
  • 200 TYPE set to ASCII.

8
gZIP RMS
  • gZIP is written in C I/Os eventually reach RMS
  • 1.6 Ghz rx2600, MSA30, OpenVMS V8.3
  • Test 1
  • Compress 5.67 GB saveset
  • Decompress 2.74 gZIP archive
  • Default O/S RMS settings
  • Test 2
  • Compress 5.67 GB saveset
  • Decompress 2.74 gZIP archive
  • SET RMS/BLOCK127/EXTEN60000/BUFFER8,
    RMS_SEQFILE_WBH1

9
gZIP RMS
Elapsed Time in Minutes (less is better)
10
Smaller MBC for Random Access
  • Times to read 1,000,000 records randomly (same
    sequence of records (where mbc passed as first
    parameter

frand 32 Elapsed time 42823ms frand
64 Elapsed time 54761ms frand 96 Elapsed
time 66343ms frand 124 Elapsed time
80122ms frand 1 Elapsed time 31205ms
frand 1 Elapsed time 31233ms frand
2 Elapsed time 31680ms frand 4 Elapsed time
32607ms frand 8 Elapsed time 33698ms
frand 16 Elapsed time 36101ms
11
RMS fsynch()
  • Writing small amount of data?
  • Using fsynch() ?
  • Slow !
  • Setting MBC MBF to 1 is (almost!) identical
  • Still need to take care of EOF

12
Sequential Writes
  • Frequent file expansions are expensive
  • Typically seen with
  • BACKUP savesets
  • Database Imports
  • FTPing large files
  • The significant amount spent expanding files
    impacts performance
  • If possible pre allocate files (container
    files)
  • Limit the number of expansions on a volume
  • SET VOLUME/EXTEND65535

13
Black Magic
  • What would you say about improving system
    performance by 5 - 20?
  • A typical response would be What does it
    take?
  • Nothing ! Just a small change to one SYSGEN
    parameter
  • .and some physical memory
  • Sounds interesting?

14
Introducing the VHPT
  • Each CPU contains a translation buffer
  • Special cache to hold recent translations of
    virtual memory address to physical address
  • When a TB miss occurs the O/S has to resolve the
    translation by walking the page tables
  • Itanium provides an extra layer for resolving
    addresses Virtual Hash Page Table (VHPT)
  • VHPT linear array of 32 byte entries
  • Created by OpenVMS at boot time but not accessed
    by it

15
VHPT
  • Order of use
  • CPU TB cache
  • VHPT
  • OpenVMS performs 3 level address translation
    walks the page tables.
  • The VHPT is sized by a system parameter -
    VHPT_SIZE
  • Default value of 1 means allocate 32KB per CPU
    for the VHPT

16
VHPT
  • Default VHPT settings should be sufficient for
    small applications (up to 8MB of virtual address
    space).
  • Large applications with poor locality would
    benefit from increasing the VHPT.
  • Generally speaking an application that benefits
    from enabling HT would benefit from an increase
    to the VHPT.
  • YMMV !!

17
VHPT Benchmark
  • The following charts illustrate the impact of
    increasing the VHPT made on Oracle batch jobs
  • rx6600 8 cores
  • OpenVMS V8.3-1H1
  • EVA8000
  • Oracle 10gR2
  • HyperThreads Enabled
  • 64 GB of physical memory
  • With VHPT 10000, 2.5GB of physical memory is
    allocated for the VHPT.

18
Oracle Batch job A
23 performance increase
Elapsed Time in Minutes (less is better)
19
Oracle Batch job B
22 performance increase
Elapsed Time in Minutes (less is better)
20
CPU Power Management (IA64 only)
  • CPUs may be placed in a lower power mode when
    idle.
  • Reduces energy costs for the system.
  • SYSGEN parameter CPU_POWER_MGMT turns this
    feature on/off.
  • May impact performance.
  • In a recent engagement we noted 30 performance
    improvement on an rx6600 by turning power
    management off (set CPU_POWER_MGMT0)

21
Shadowed RAM disk
  • Shadowed RAM disk for applications that
    frequently read data from disk.
  • The Shadow server will read from memory and will
    write to both devices.
  • Forces data to remain resident in memory
  • Significantly boosts performance when files are
    opened cluster wide by multiple users
  • XFC will not help
  • Beneficial if file update rate is low compared to
    the read rate
  • Included in the EOE MCOE packages

22
Physical Disk Vs. RAM disk
  • C application that processes records read from
    sequential file
  • Each I/O 124 Blocks
  • RX2600, OpenVMS V8.3, HSG80

Elapsed time to read 250MB file (less is better)
23
V8.3-1H1
  • When possible upgrade to V8.3-1H1
  • Performance improvements
  • Always inspire to stay current with O/S version
  • Relink Applications using the V8.3-1H1 Linker
  • The new linker produces smaller images
  • Reduction between 2 - 18
  • 0 is also possible
  • Montvale based systems There is more than meets
    the eye

24
V8.3-1H1 Addendum kit
  • EFICHK operation is performed during the patch
    installation
  • Performance improvements The following product
    will be installed to destination    HP I64VMS
    VMS831H1I_ADDENDUM V1.0      DISKSYS831H1VMSCO
    MMON.
  •  
  • Portion done 0...10...20...30...40...50...7
    0...80...90                                   
                                 MOUNT-I-FATCHECK,
    volume created by EFICP version
    V5.2-5                    checking for errors,
    repairing, and updating FAT information.EFICP-W-
    BADCCNT, FS0\EFI\VMS\TOOLS\ACPIDUMP.EFI actual
    cluster count of 126 does not match the file
    allocation of 127.                         
    Filesize of 258232 bytes, requires 508 blocks
    (rounded to the cluster factor of
    4)                          508 blocks shown
    allocated, but 126 actual clusters (504 blocks)
    counted in file                          The
    disk storage (258048 bytes) is smaller than the
    file size (258232 bytes)                         
    Truncating file!  CHECK CONTENTS FOR
    VALIDITYEFICP-I-FATCHECK, 1 errors found, 1
    fixed.  18 files in 4 folders checked, 12095166
    total bytes in 5913 clustersEFICP-I-FATCHECK,
    Updating the FAT EFICP version information to
    V6.0-1, FAT version 1EFI-I-COPIED, copied
    FS0\EFI\VMS\IPB.EXE to PCSIDESTINATIONSYSEXEF
    LAG_IPB.EXEEFI-I-COPIED, copied
    PCSIDESTINATIONSYSEXEIPB.EXE to
    FS0\EFI\VMS\EFI-I-COPIED, copied
    FS0\EFI\VMS\IPB.EXE to PCSIDESTINATIONSYSEXEC
    HECK_IPB.EXE...100COPIED, copied
    FS0\EFI\VMS\VMS_LOADER.EFI to PCSIDESTINATIONS
    YSEXECHECK_VMS_LOADER.EFI

25
Resident Images a mystery
AlphaServer GS1280 7/1150
Elapsed time to execute a program (less is better)
26
Resident Images
AlphaServer GS1280 7/1150
Elapsed time to execute a program (less is better)
27
Resident Images
rx6600 4P/8C 1.6 Ghz
Elapsed time to execute a program (less is better)
28
Resident Images
  • Alpha
  • the image activator has to apply the relocations
    - pagefaults
  • Link using /sectioncode
  • Avoid /sectiondata
  • IA64
  • relocations are mapped into memory (the dynamic
    segment stays in paged pool)

29
SORTing
  • HYPERSORT
  • Multi-threaded
  • define sortshr syslibraryhypersort.exe
  • Spread work files among disks/controllers/adaptors
  • Apart from input/output disks
  • No problem to have input and output on same disk

30
Sort 100,000,000 Records
  • 100 bytes each
  • 19,531,250 blocks
  • 3 work files
  • 618,000 IO Sort32
  • 922,000 IO HyperSort
  • No XFC file caching of input, output or work
  • HyperSort Elapsed lt CPU

31
PEDRIVER Data Compression
  • OpenVMS V8.3
  • Reduces traffic between nodes
  • May be beneficial for Shadow copy and MSCP
    traffic
  • Can be enabled system wide or per VC

32
Turn on compression for one VC
  • SCACPgt set vc it14/comp
  • SCACPgt sh vc
  • IT13 PEA0 VC Summary 30-JAN-2007 074328.02
  • Remote VC Total Channels ECS
    MaxPkt ReXmt --XmtWindow-- Xmt Total
    ----------- Most Recent -----------
  • -
  • Node State Errors XmtTMO Open ECS Pri
    Size TMO(uSec) Cur Max Mgt Options Pkts(SR)
    VC Opened Time VC Closed Time
  • ------ ----- ------ --------- ---- --- ---
    ---- --------- ---- ---- ---- ------ ---------
    ------------------ ---------------
  • ---
  • ALPH50 Open 4 115444 2 2 0
    1426 672330.3 33 64 0 889107
    21-JAN 133425.78 (No time)
  • ALPH40 Open 0 Infinite 2 2 0
    1426 516452.3 16 32 0 803545
    21-JAN 133425.72 (No time)
  • IT14 Open 1 790292 2 2 0
    1426 223273.5 32 64 0 CMP 1242954
    21-JAN 133425.93 (No time)
  • IT13 Open 0 Infinite 1 1 0
    1426 3000000.0 1 8 0 5
    21-JAN 133423.05 (No time)

33
PEDRIVER Data Compression
  • Copy 250MB file to MSCP served SCSI disk
  • Both systems are rx2600, running OpenVMS V8.3

Elapsed time to copy 250MB file (less is better)
34
Alignment Faults
  • No performance talk is complete without
    mentioning Alignment Faults
  • Alignment faults on Itanium will have serious
    impact on performance
  • May be an (performance) issue on Alpha as well

35
What is an Alignment Fault?
  • When an attempted
  • Longword memory access is not aligned on a memory
    boundary that is divisible by 4
  • Quadword memory access is not aligned on a memory
    boundary that is divisible by 8
  • Word memory access is not aligned on a boundary
    that is divisible by 2
  • An alignment fault is generated and control is
    transferred to code that will complete the
    load/store through shifting, masking and setting
    bits.

36
Why Worry?
OpenVMS Monitor
Utility ALIGNMENT
FAULT STATISTICS
on node DWARF
3-MAY-2007 142656.27
CUR AVE MIN
MAX Kernel Fault Rate 0.00
0.66 0.00 1.33 Exec Fault
Rate 0.00 0.00 0.00
0.00 Super Fault Rate
0.00 0.00 0.00 0.00 User
Fault Rate 640253.31 662505.00
640253.31 684756.68 Total Fault Rate
640253.31 662505.83 640253.31 684758.31
37
Why Worry?
----- TIME IN PROCESSOR
MODES CUR on node
DWARF ----- 3-MAY-2007
142659.27 Combined for 2 CPUs
0 50 100 150 200
- - - - - - - -
- - - - - - - - Interrupt State

MP
Synchronization 9

Kernel Mode
172

Executive Mode

Supervisor
Mode

User Mode 19

Compatibility Mode


Idle Time
- - - - - - - - -
- - - - - - -
38
Let the Compiler Warn You in Advance
  • cc/nomember/warningenablealignment align_test
  • int x
  • ................
  • CC-I-MISALGNDMEM, This member is at offset 1,
    which is not a multiple of the member's alignment
    of longword. Consider padding before this
    member, rearranging the order of member
    declarations, or using pragma member_alignment.
  • at line number 10 in file SYSSYSDEVICEtestALIG
    N_TEST.C7
  • int x
  • ................
  • CC-I-MISALGNDSTRCT, This member requires
    longword alignment for efficient access, but is
    contained in a struct containing byte alignment.
    Consider using pragma nomember_alignment
    longword.
  • at line number 10 in file SYSSYSDEVICEtestALIG
    N_TEST.C7
  • sub(zi.x,zi.a)
  • ....................
  • CC-W-ALIGNCONFLICT, In this statement, the
    address "zi.x" has alignment of byte which is
    less than the alignment requirements of
  • the destination pointer. Dereferencing the
    destination pointer may cause an alignment fault.
  • at line number 22 in file SYSSYSDEVICEtestALIG
    N_TEST.C7

39
Reporting Alignment Faults
  • Analyze alignment faults on Alpha prior to a port
  • Only works on current process
  • sysperm_report_align_fault
  • sysperm_dis_align_fault_report
  • r align_testAddress of x 10001SYSTEM-I-ALI
    GN, data alignment trap, virtual
    address0000000000010001, function00000000,
    PC000000001DCF0202, PS0000001BSYSTEM-I-ALIGN,
    data alignment trap, virtual address0000000000010
    001, function00000001, PC000000001DCF0212,
    PS0000001BSYSTEM-I-ALIGN, data alignment trap,
    virtual address0000000000010006,
    function00000000, PC000000001DCF0202,
    PS0000001BSYSTEM-I-ALIGN, data alignment trap,
    virtual address0000000000010006,
    function00000001, PC000000001DCF0212,
    PS0000001BSYSTEM-I-ALIGN, data alignment trap,
    virtual address000000000001000B,
    function00000000, PC000000001DCF0202,
    PS0000001BSYSTEM-I-ALIGN, data alignment trap,
    virtual address000000000001000B,
    function00000001, PC000000001DCF0212,
    PS0000001BSYSTEM-I-ALIGN, data alignment trap,
    virtual address0000000000010015,
    function00000000, PC000000001DCF0202,
    PS0000001B

40
(No Transcript)
41
Process Affinity
  • Running on a large system with a low load?
  • Running on a large system with heavy load?
  • Better utilize the CPU caches (data cache,
    instruction cache TB) by affinitizing your
    process to a set of CPUs
  • In HT environment affinitize to one core
  • Up to 25 performance increase

42
Generating Primes GS 1280 7/1150
EV7 has EV68 core
43
Free Hot File Tracking Utility
  • sh mem/cache(volume,topqio)
  • System Memory Resources on
    26-APR-2007 013915.03
  • Extended File Cache Top QIO File Statistics
  • _1DGA642 (DISKES40), Caching mode is VIOC
    Compatible
  • _1DGA642VMSCOMMON.SYSEXERIGHTSLIST.DAT1
    (open)
  • Caching is enabled, active caching mode is Write
    Through
  • Allocated pages 9 Total QIOs
    107
  • Read hits 92 Virtual
    reads 107
  • Virtual writes 0 Hit rate
    85
  • Read aheads 0 Read
    throughs 107
  • Write throughs 0 Read
    arounds 0
  • Write
    arounds 0
  • _1DGA642VMSCOMMON.SYSEXEVMSOBJECTS.DAT2
    (open)
  • Caching is enabled, active caching mode is Write
    Through
  • Allocated pages 0 Total QIOs
    9

44
Free Hot File Tracking Utility
  • _1DGA242 (DISKITANIUMVMS), Caching mode is
    VIOC Compatible
  • _1DGA242VMSCOMMON.SYSLIBDECCSHR.EXE1
    (open)
  • Caching is enabled, active caching mode is Write
    Through
  • Allocated pages 303 Total QIOs
    1646
  • Read hits 1561 Virtual
    reads 1646
  • Virtual writes 0 Hit rate
    94
  • Read aheads 0 Read
    throughs 1642
  • Write throughs 0 Read
    arounds 4
  • Write
    arounds 0
  • _1DGA242VMSCOMMON.SYSLIBLIBRTL.EXE1 (open)
  • Caching is enabled, active caching mode is Write
    Through
  • Allocated pages 143 Total QIOs
    1165
  • Read hits 1123 Virtual
    reads 1165
  • Virtual writes 0 Hit rate
    96
  • Read aheads 0 Read
    throughs 1164
  • Write throughs 0 Read
    arounds 1
  • Write
    arounds 0

Avoid caching files that pollute the cache
45
Elapsed time for I/Os
  • SDAgt xfc show volume/brief
  •  
  • Summary of XFC Cached Volumes (CVBs)-------------
    -----------------------Volume Name      
    CVB                Open   Closed      Total      
    Read       Read      Write      ... Response
    (Milliseconds)...                                
        Files    Files       I/Os       Hits     
    Count      Count          Hits       disk   
    AverageDISKCARFAX       FFFFFFFEE01895E0     
    0        0          0          0         
    0          0       (N/A)      (N/A)     
    (N/A)DISKUP           FFFFFFFEE0189380     
    0        0          0          0         
    0          0       (N/A)      (N/A)     
    (N/A)DISKORADAT       FFFFFFFEE0189120    
    26        3    1872255          0          0   
    1872255       (N/A)       0.0000    
    0.0000DISKORADSK       FFFFFFFEE0188EC0    
    73      177   22015701   14108183   21116834    
    898891        0.0232     0.5811    
    0.2236DISKIA64_V82     FFFFFFFEE0188C60     
    0        0          0          0         
    0          0       (N/A)      (N/A)     
    (N/A)DISK82SOURCE     FFFFFFFEE0188A00     
    0        0          1          0         
    1          0       (N/A)      (N/A)     
    (N/A)DISKIT14_10292   FFFFFFFEE01887A0     
    2        0          0          0         
    0          0       (N/A)      (N/A)     
    (N/A)DISKES40         FFFFFFFEE0188540     
    4        3   27676052   27667501   27674665      
    1387        0.0118     0.4007    
    0.0120DISKIT14_DOSD    FFFFFFFEE01882E0     
    0        0          0          0         
    0          0       (N/A)      (N/A)     
    (N/A)DISKSYS831H1     FFFFFFFEE0188080   
    313      183    2736618    2668894   
    2713025      23594        0.0179     0.5425    
    0.0308

SDAgtXFC SHOW VOLUME/BRIEF
46
The XFC overhead
RDB users consider disabling caching of .RDA
files
Elapsed time to copy 150MB file, rx2600, HSG80,
OpenVMS V8.3
47
IBM MQ series
  • MQ is a heavy user of pthreads
  • Set MULTITHREAD to 1
  • Thread manager upcalls are enabled the creation
    of multiple kernel threads is disabled

48
Sizing Working Sets
  • Respect AUTOGEN but dont trust it blindly
  • Alpha Server ES47, 16GB RAM
  • maximum process count of 2500 processes
  • AUTOGEN will set PQL_MWSDEFAULT to 17.38MB
  • 17.38MB X 2500 43.45GB RAM
  • Exceeds Physical memory by almost 3 times

49
Sizing Working Sets
  • Its not 1980 any more
  • Determine the size of XFC cache MPW_HILIMIT
  • Subtract the sum from the number of fluid pages
    on the system (MMGGQ_FLUID_PGCNT)
  • Divide by the maximum number of processes that
    have ever been running on the system
    (PMSGL_PROCCNTMAX)
  • Multiply the result by 16 to translate from pages
    to pagelets
  • If you are conservative, take 70 of the result
    and set working set limit and quota to this value
  • Working set extent should be 3 times the result
  • Make sure PGFLQUOTA is properly sized

50
TCP/IP Gigabit Ethernet
  • Using Gigabit Ethernet?
  • Turn on Jumbo frames
  • Frames larger than 1518 bytes, more data per
    frame -gt less frames -gt less interrupts -gt better
    performance
  • Must be supported by the switch
  • Must be configured before TCP/IP is started
  • mc lancp set dev ewa/jumbo
  • Bit 6 in SYSGEN parameter LAN_FLAGS

51
Toolbox Overview
  • Collection of highly valuable, undocumented
    unsupported tools, subject to change without a
    notice
  • Implemented as SDA extensions
  • Use hooks in the VMS executive
  • May be loaded and unloaded on the fly
  • No reboot required
  • Trace data is stored in ring buffer in S2 space
  • May be viewed from a crash dump

52
Toolbox Overview

  • First shipped in
  • CNX connection manager tracing V7.2-2
  • EXC exception tracing V8.2
  • FC Fibrechannel debug and tracing V7.2-2
  • FLT alignment fault tracing V8.1
  • IO buffered and direct I/O tracing V7.3-2
  • LCK lock manager tracing V7.2-2
  • LNM logical name tracing V7.3-1
  • MTX mutex tracing V7.3
  • PCS PC sampling V7.3-2
  • PRF performance utility V8.2
  • PSH pshared debug utility V8.2-1

53
Toolbox Overview

  • First shipped in
  • RDB Rdb lock decoding and tracing V7.3-2
  • RMS indexed file tracing V8.2-1
  • SPL spinlock tracing V7.2-1H1
  • TQE timer entry tracing V7.3-1
  • TR debug and trace prints V7.3
  • XFC eXtended File Cache diagnostics V7.3

54
Toolbox Overview
  • Common commands
  • SDAgt xxx ! Displays brief command help
  • SDAgt xxx LOAD
  • SDAgt xxx START TRACE /BUFFER3000
  • SDAgt xxx SHOW TRACE
  • SDAgt xxx STOP TRACE
  • SDAgt xxx UNLOAD
  • SDAgt READ /EXEC /NOLOG

55
PRF
  • PRF is highly powerful SDA extension for
    monitoring various performance counters at the
    processor level.
  • May be used for PC sampling.
  • Highlights areas in the application that require
    performance enhancements.

56
PRF
  • SDAgt prf load
  • PRFDEBUG load status 00000001
  • SDAgt prf start pc/ind21E004DA
  • PC Sampling started...
  • SDAgt prf start collect
  • SDAgt
  • Now run the application
  • r prime
  • ELAPSED 0 000024.16 CPU 00024.06
    BUFIO 0 DIRIO 0 FAULTS 0
  • To look at the collected data
  • SDAgt prf show collect

57
PRF SHOW COLLECT
  • Start VA End VA Image
    Count
    Percent
  • ----------------- -----------------
    ----------------------------------------
    ----------- --------
  • FFFFF802.11F00000 FFFFF802.11F01FFF PRIME
    305113
    99.85
  • FFFFF802.A1000000 FFFFF802.A1015FFF Kernel
    Promote VA 1
    0.00
  • FFFFFFFF.80000000 FFFFFFFF.800000FF
    SYSPUBLIC_VECTORS
    2 0.00
  • FFFFFFFF.80000100 FFFFFFFF.800111FF
    SYSBASE_IMAGE
    2 0.00
  • FFFFFFFF.80011200 FFFFFFFF.800651FF
    SYSPLATFORM_SUPPORT
    258 0.08
  • FFFFFFFF.800A0000 FFFFFFFF.801DD6FF
    SYSTEM_PRIMITIVES
    88 0.03
  • FFFFFFFF.801DD700 FFFFFFFF.80243BFF
    SYSTEM_SYNCHRONIZATION_MIN
    9 0.00
  • FFFFFFFF.80254600 FFFFFFFF.8026EFFF
    SYSEIDRIVER.EXE
    5 0.00
  • FFFFFFFF.8026F000 FFFFFFFF.802895FF SYSLAN.EXE
    2
    0.00
  • FFFFFFFF.80289600 FFFFFFFF.802BA1FF
    SYSLAN_CSMACD.EXE
    2 0.00
  • FFFFFFFF.80440E00 FFFFFFFF.8052B2FF IO_ROUTINES
    1
    0.00
  • FFFFFFFF.8053A600 FFFFFFFF.80670DFF
    PROCESS_MANAGEMENT
    7 0.00
  • FFFFFFFF.80670E00 FFFFFFFF.807759FF SYSVM
    11
    0.00
  • FFFFFFFF.80779500 FFFFFFFF.807C76FF LOCKING
    1
    0.00
  • FFFFFFFF.807C7700 FFFFFFFF.807F9CFF
    MESSAGE_ROUTINES
    1 0.00

58
PRF SHOW COLLECT
  • SDAgt prf show coll/threash2
  • PC Count Rate
    Symbolization Module
    Offset
  • ----------------- ------- ---------
    ----------------------------------------
    ------------------------- --------
  • FFFFF802.11F00170 63410 20.07
    PRIME10170 PRIME
    00010170

  • GENERATE_PRIME00000170 / GENERATE_PRIME00000170
  • FFFFF802.11F00190 6138 2.01
    PRIME10190 PRIME
    00010190

  • GENERATE_PRIME00000190 / GENERATE_PRIME00000190
  • FFFFF802.11F001A0 6761 2.21
    PRIME101A0 PRIME
    000101A0

  • GENERATE_PRIME000001A0 / GENERATE_PRIME000001A0
  • FFFFF802.11F00200 6296 2.06
    PRIME10200 PRIME
    00010200

  • GENERATE_PRIME00000200 / GENERATE_PRIME00000200
  • FFFFF802.11F00220 8102 2.65
    PRIME10220 PRIME
    00010220

  • GENERATE_PRIME00000220 / GENERATE_PRIME00000220
  • FFFFF802.11F00290 6804 2.23
    PRIME10290 PRIME
    00010290

59
Montecito
Source Wikipedia
60
Hyperthreading with Stalls vs Hyperthreading with
No Stalls
61
Two Cores vs Hyperthreading (NoStalls)
62
HyperThreads Impact on Oracle Jobs
Elapsed time (minutes) to execute 7 jobs Less is
better
63
HyperThreads
  • HyperThreads have the potential of improving
    performance
  • Application has to meet the following criteria
  • COM Queue
  • Poor locality (L2/L3 misses)
  • No pagefulating
  • PRF may be used to track L2 misses
  • PRF START PROFILE/CPUn/CACHEL2/INDEXPID
  • PRF START COLLECT

64
L2 Cache Misses on TC_CF (13.2 improvement)
  • I-Cache Misses D-Cache Misses Branch
    Trace Buf
  • Start VA End VA Image
    Latency Percent
    Latency Percent Count Percent
  • ----------------- -----------------
    ----------------------------------- ----------
    ------- ---------- ------- ---------- -------
  • 00000000.00000000 00000000.7ADCBFFF Process
    Space 17062 1.73
    6072893 96.52 244963 8.62
  • 00000000.7ADCC000 00000000.7AEF7FFF DCL
    101 0.01
    0 0.00 242 0.01
  • FFFFF802.0806C000 FFFFF802.0825DFFF LIBRTL
    4104 0.42
    1217 0.02 21753 0.77
  • FFFFF802.0825E000 FFFFF802.08283FFF LIBOTS
    2150 0.22
    123 0.00 240662 8.47
  • FFFFF802.082E8000 FFFFF802.0837FFFF SMGSHR
    52 0.01
    10 0.00 211 0.01
  • FFFFF802.08404000 FFFFF802.0840DFFF CMATIS_SHR
    281 0.03
    0 0.00 1504 0.05
  • FFFFF802.08444000 FFFFF802.084F7FFF DPMLSHR
    5 0.00
    0 0.00 1 0.00
  • FFFFF802.084F8000 FFFFF802.085A9FFF PTHREADRTL
    2657 0.27
    294 0.00 6315 0.22
  • FFFFF802.085AA000 FFFFF802.090B3FFF DECCSHR
    24027 2.43
    6258 0.10 369765 13.02
  • FFFFF804.0E000000 FFFFF804.0E015FFF Kernel
    Promote VA 2232 0.23
    0 0.00 5191 0.18
  • FFFFFFFF.80000000 FFFFFFFF.800000FF
    SYSPUBLIC_VECTORS 403
    0.04

65
L2 Cache Misses on PRIMES_1 (Slight Degradation)
  • Cache Misses Branch Trace Buf
  • Start VA End VA Image
    Latency Percent
    Latency Percent Count Percent
  • ----------------- -----------------
    ----------------------------------- ----------
    ------- ---------- ------- ---------- -------
  • 00000000.00000000 00000000.7ADCBFFF Process
    Space 5077 2.77
    29968 52.88 26607 5.27
  • 00000000.7ADCC000 00000000.7AEF7FFF DCL
    19 0.01
    0 0.00 22 0.00
  • FFFFF802.0806C000 FFFFF802.0825DFFF LIBRTL
    949 0.52
    570 1.01 3816 0.76
  • FFFFF802.0825E000 FFFFF802.08283FFF LIBOTS
    63 0.03
    0 0.00 201 0.04
  • FFFFF802.082E8000 FFFFF802.0837FFFF SMGSHR
    20 0.01
    0 0.00 46 0.01
  • FFFFF802.08404000 FFFFF802.0840DFFF CMATIS_SHR
    0 0.00
    0 0.00 6 0.00

66
LNM
  • The LNM extension allows tracking logical name
    translations.
  • Logical name translations are expensive from a
    performance point of view and should be avoided
    when possible.
  • MONITOR IO displays the total number of logical
    name translations per second

67
LNM Example
  • SDAgt lnm show collect
  • Logical Name Trace Information
  • -------------------------------
  • Count Logical Name
  • ------------ -------------------------------
  • 5000 SYSSCRATCH !SYSSCRATCH is
    being translated 5000 times
  • 10 SYSSHARE
  • 10 SYSSYSROOT
  • 5 GBLINS8DDE9730
  • 5 SYSCOMMON
  • 4 GBLINS8DDAE310
  • 4 SYSOUTPUT
  • 3 GBLINS8DDC20D0
  • 3 GBLINS8DDD1A60
  • 3 IPCACP_NETMBX
  • 2 CMATIS_SHR
  • 2 DPMLSHR
  • 2 LIBOTS
  • 2 LIBRTL

68
LNM Example
  • SDAgt lnm show trace
  • Logical Name Trace Information
  • -------------------------------
  • Timestamp CPU EPID Main Image
    CallerPC
    Logical Name
  • ---------------------- --- --------
    ---------------------- ---------------------------
    ------------- --------------------------------
  • 25-JAN 062215.530026 01 21E0040E IPCACP
    FFFFFFFF.80514560 IOCTRANDEVNAM_C007C0
    IPCACP_NETMBX
  • 25-JAN 062205.530027 01 21E0040E IPCACP
    FFFFFFFF.80514560 IOCTRANDEVNAM_C007C0
    IPCACP_NETMBX
  • 25-JAN 062130.440094 00 21E004DA MANY_TRNLNMS
    00000000.00000000
    SYSOUTPUT
  • 25-JAN 062130.440010 00 21E004DA MANY_TRNLNMS
    00000000.00000000
    PASOUTPUT
  • 25-JAN 062130.439846 00 21E004DA MANY_TRNLNMS
    00000000.00000000
    SYSSCRATCH
  • 25-JAN 062130.439835 00 21E004DA MANY_TRNLNMS
    00000000.00000000
    SYSSCRATCH
  • 25-JAN 062130.439825 00 21E004DA MANY_TRNLNMS
    00000000.00000000
    SYSSCRATCH
  • 25-JAN 062130.439814 00 21E004DA MANY_TRNLNMS
    00000000.00000000
    SYSSCRATCH
  • 25-JAN 062130.439803 00 21E004DA MANY_TRNLNMS
    00000000.00000000
    SYSSCRATCH
  • 25-JAN 062130.439792 00 21E004DA MANY_TRNLNMS
    00000000.00000000
    SYSSCRATCH
  • 25-JAN 062130.439782 00 21E004DA MANY_TRNLNMS
    00000000.00000000
    SYSSCRATCH
  • 25-JAN 062130.439771 00 21E004DA MANY_TRNLNMS
    00000000.00000000
    SYSSCRATCH
  • 25-JAN 062130.439760 00 21E004DA MANY_TRNLNMS
    00000000.00000000
    SYSSCRATCH
  • 25-JAN 062130.439750 00 21E004DA MANY_TRNLNMS
    00000000.00000000
    SYSSCRATCH

69
LNM Cobol
  • Do you have an application written in Cobol?
  • COB5644

70
Decoding PCs
  • New routine to decode PC into module and routine
    names with offsets (IA64 only)
  • tfget_mod_rtn in module TRACE_ELF in
    SYSSHAREVMSVOLATILE_PRIVATE_INTERFACES.OLB
  • tfget_mod_rtn ( entry-gtspltreq_pc, mod_name,
    rtn_name, mod_rel_pc, rtn_rel_pc )

71
Questions?
  • See us at www.maklee.com for
  • Performance improvements
  • Oracle Tuning
  • Platform Migration
  • Custom Engineering solutions
  • Custom Training
Write a Comment
User Comments (0)
About PowerShow.com