Monitoring and Tuning Oracle for z/OS and Oracle for z/Linux - PowerPoint PPT Presentation

About This Presentation
Title:

Monitoring and Tuning Oracle for z/OS and Oracle for z/Linux

Description:

Monitoring and Tuning Oracle for z/OS and Oracle for z/Linux Thomas Niewel Oracle Deutschland GmbH Thomas.Niewel_at_Oracle.com Agenda Tuning Why ? Reasons for bad ... – PowerPoint PPT presentation

Number of Views:541
Avg rating:3.0/5.0
Slides: 61
Provided by: G198
Category:

less

Transcript and Presenter's Notes

Title: Monitoring and Tuning Oracle for z/OS and Oracle for z/Linux


1
Monitoring and Tuning Oracle for z/OS andOracle
for z/Linux
2
Thomas NiewelOracle Deutschland
GmbHThomas.Niewel_at_Oracle.com
3
Agenda
  • Tuning Why ?
  • Reasons for bad Response Time
  • Statspack
  • Diagnosing reasons for bad response Times
  • SQL Tuning
  • TKPROF
  • Explain Plan
  • WLM

4
Why do we need to tune ?
  • Users report bad response times because of
  • CPU Time Wait Time
  • Poor performing queries
  • SQL-Tuning
  • bad Database parameters
  • Bottlenecks in System (Operating System, WLM,
    IO/Subsystem etc.)

5
What can be the reasons for bad Response Time
  • High CPU Usage
  • High I/O Usage
  • Memory Usage
  • Network problems
  • idle System
  • Operating System (WLM, VM)

6
Diagnose from the Oracle point of view
StatspackA short overview
7
Statspack a short overview
  • spcreate.sql - installs Statspack (run only
    once)
  • statspack.snap - data capture (procedure)
  • spreport.sql - reporting
  • spdoc.txt - user documentation
  • sppurge.sql - delete Statspack data
  • spdrop.sql - drop Statspack

8
Capturing data
  • Prerequisite timed_statisticstrue
  • Use stored procedure statspack.snap
  • SQLgt execute statspack.snap

9
Capturing data
  • Get a baseline for future comparisons
  • Capture snapshots
  • across peak load
  • across batch window
  • The time between snapshots should be lt 30
    minutes
  • Capture can be automated
  • Use OS utility e.g. cron
  • Use dbms_job
  • spauto.sql shipped as example

10
Reporting with Statspack
  • All data is held in an Oracle database
  • Report between two or more snapshots
  • cannot report across instance startup
  • Spreport.sql creates a report

11
Reporting with Statspack
  • SQLgt _at_spreport
  • DB Id DB Name Instance Instance
  • ----------- ---------- ---------- ----------
  • 1361567071 DB21 1 MAIL
  • Completed Snapshots
  • Instance DB Name SnapId Snap Started
    Snap Level
  • ---------- ---------- ------ -------------------
    --- ----------
  • DB21 DB21 1 17 Aug 2003
    100016 5
  • 2 17 Aug 2003
    103028 5
  • Enter beginning Snap Id 1
  • Enter ending Snap Id 2
  • Enter name of output file sp_1_2 ltenter
    name or returngt

12
Analyzing a Statspack report
  • Top down analysis
  • Summary page
  • Enviroment
  • Load profile
  • Instance efficiency
  • Shared pool usage
  • Top 5 Timed Events
  • Top SQL

13
Environment section
STATSPACK report for DB Name DB Id
Instance Inst Num Release Cluster
Host ------------ ----------- ------------
-------- ----------- ------- ------------ RECONPRD
1403107896 RECONPRD 1 9.2.0.2.0
NO lin390t1 Snap Id Snap
Time Sessions Curs/Sess Comment
------- ------------------ -------- ---------
------------------- Begin Snap 2 03-Mar-03
112801 10 5.1 End Snap 31
04-Mar-03 115804 17 5.5 Elapsed
30.05 (mins) Cache Sizes
(end) Buffer
Cache 256M Std Block Size
16K Shared Pool Size 48M
Log Buffer 128K
14
Load profile
  • Contains a number of common ratios
  • Allows characterisation of the application
  • Can point to problems
  • high hard parse rate
  • high IO rate
  • high login rate

15
Load profile
  • Useful if you have a comparable baseline
  • What has changed?
  • txn/sec change implies changed workload
  • redo size/txn implies changed transaction mix
  • physical reads/txn implies changed SQL or plan

16
Load profile
  • Load Profile
  • Per
    Second Per Transaction

  • --------------- ---------------
  • Redo size
    19,057.68 20,937.67
  • Logical reads
    2,408.15 2,645.70
  • Block changes
    98.64 108.37
  • Physical reads
    990.47 1,088.18
  • Physical writes
    6.92 7.61
  • User calls
    76.40 83.93
  • Parses
    7.08 7.78
  • Hard parses
    0.02 0.02
  • Sorts
    29.22 32.10
  • Logons
    24.73 27.17
  • Executes
    63.79 70.08
  • Transactions
    0.91
  • Blocks changed per Read 4.10 Recursive
    Call 72.76
  • Rollback per transaction 36.52 Rows
    per Sort 153.46

17
Instance Efficiency
  • Gives an overview of how the instance is
    performing
  • Can also be used with a comparable baseline
  • Shared pool Statistics allow quick identification
    of cursor sharing problems

18
Instance Efficiency
Instance Efficiency Percentages (Target
100)
Buffer Nowait 99.99 Redo
NoWait 99.97 Buffer Hit
59.00 In-memory Sort 99.99
Library Hit 99.94 Soft Parse
99.69 Execute to Parse 88.89
Latch Hit 99.98 Parse CPU to Parse Elapsd
56.55 Non-Parse CPU 99.93 Shared
Pool Statistics Begin End
------ ------
Memory Usage 38.86 66.81 SQL with
executionsgt1 43.41 87.22 Memory for SQL
w/execgt1 39.28 80.21
19
Top 5 Timed Events
  • CPU time real work
  • Shows where Oracle sessions are waiting
  • Compare Wait Time to elapsed time
  • Total Wait Time shows potential benefits
  • Use as basis for directed drilldown

Total Event
Waits Time (s) Ela Time -----------------------
-------- ------------ ----------- -------- CPU
time
78,588 50.24 enqueue
1,560,523 59,961 38.33 db file
sequential read 1,635,253 6,324
4.04 db file scattered read
14,620,725 5,907 3.78 control file
parallel write 32,816 1,396
.89
20
Top 5 Timed Events
  • Sample drilldowns
  • CPU Time on CPU
  • enqueue e.g TX Enqueue
  • db file sequential read
  • Index Access
  • db file scattered read
    Scan Operationscontrol file parallel write

21
Top SQL
  • Helps to find problem statements
  • SQL ordered by Gets
  • SQL ordered by Reads
  • SQL ordered by Executions
  • SQL ordered by Parse Calls

22
Top SQL

CPU Elapsd Buffer Gets Executions
Gets per Exec Total Time (s) Time (s) Hash
Value --------------- ------------ --------------
------ -------- --------- ----------
79,562,398 8,114 9,805.6 34.6
27182.71 28127.71 1525844323 Module
SQLPlus SELECT MAX(STMT_BKG_DATE_CLOSE) FROM
GAH_T_STATEMENTS WHERE S TMT_ACCT_ID b1 AND
((b2 b3 AND STMT_CARRIER ! b4 AND
STMT_MSG_TYPE ! b5 AND (b6 IS NULL OR b6
STMT_CARRIER ) AND ((b8 IS NULL AND
STMT_MSG_TYPE ! b9 ) OR (b8 IS NOT NU LL AND
b8 STMT_MSG_TYPE ))) OR (b2 b13 AND
STMT_CARRIER
23
I/O Statistics
  • Help to find I/O Problems
  • Tablespace IO Stats
  • File IO Stats

24
I/O Statistics
Tablespace ------------------------------
Av Av Av Av
Buffer Av Buf Reads Reads/s
Rd(ms) Blks/Rd Writes Writes/s Waits
Wt(ms) -------------- ------- ------ -------
------------ -------- ----------
------ GAH_TS00_DT_MEDIUM 15,242,896 160
0.4 6.1 41,066 0 22,468
18.4 GAH_TS00_IX_ITEM 210,346 2
11.2 1.0 130,299 1 9
15.6 GAH_TS00_IX_MEDIUM 207,433 2
6.9 1.0 86,699 1 39
43.8 RECONPRD_TS00_TEMP 185,865 2
1.7 1.6 101,560 1 0
0.0 GAH_TS00_IX_ITEM_REF 155,027 2
8.4 1.0 34,867 0 1
0.0
25
Diagnosing high CPU usage
  • High CPU Usage
  • High I/O utilization
  • Memory Usage
  • Network problems
  • idle System
  • Operating System (WLM, VM)

26
Diagnosing high CPU usage-Operating System-
  • Linux/390
  • sar -u 3 3333
  • iostat -x 3
  • vmstat 3
  • top
  • Etc.
  • Z/OS
  • SDSF
  • RMF
  • Omegamon
  • etc.

27
Diagnosing high CPU usage
  • What can be the reason for high CPU Usage ?
  • Shared_Pool / SQL-Cache
  • db_file_multiblock_read_count
  • Buffer_Cache/ Buffer_Pool
  • How can Statements with a great of buffergets
    be seperated ?
  • Statspack
  • SQL Script

28
Diagnosing high CPU usage

CPU Elapsd Buffer Gets Executions
Gets per Exec Total Time (s) Time (s) Hash
Value --------------- ------------ --------------
------ -------- --------- ----------
4,494,662 155 28,997.8 2.0
1049.63 2414.11 3961361411 SELECT FROM
GAH_T_STATEMENTS WHERE STMT_ACCT_ID b1
AND ((b2 b3 AND STMT_CARRIER ! b4 AND
STMT_MSG_TYPE ! b5 AND (b6 IS NULL OR b6
STMT_CARRIER ) AND ((b8 IS NULL AND
STMT_MSG_TYPE ! b9 ) OR (b8 IS NOT NULL AND
b8 STMT_MSG_ TYPE ))) OR (b2 b13 AND
STMT_CARRIER b14 AND STMT_MSG_T Module
SQLPlus
29
Diagnosing high CPU usage
spool cpu_users.lstselect buffer_gets,disk_reads
,executions,ratio_to_report(buffer_gets) over ()
100 buffer_ratio,ratio_to_report(disk_reads)
over () 100 disk_ratio,sql_text from
vsqlareaorder by buffer_ratio descspool off
30
Diagnosing high CPU usage
BUFFER_GETS DISK_READS EXECUTIONS BUFFER_RATIO
DISK_RATIO ----------- ---------- ----------
------------ ---------- SQL_TEXT -----------------
--------------------------------------------------
--------------------- 19564429 154
46908 65.9945773 5.40350877 select t.schema,
t.name, t.flags, q.name from system.aq_queue_tabl
es t, ys.aq_queue_table_affinities aft,
system.aq_queues q where aft.table_objno
t.objno and aft.owner_instance 1 and
q.table_objno t.objno and q.usage 0 and
bitand(t.flags, 4163264128256) 0 for
update of t.name, aft.table_objno skip locked
31
SQL Tuning
  • Check Object Statsitics
  • Use DBMS_STATS
  • Analyze Execution Plan
  • Explain Query / VSQL_PLAN
  • Optimize Query
  • Optimize Indexes
  • Index Only Access, Function Based Indexes

32
Diagnose
  • High CPU Usage
  • High I/O utilization
  • Memory Usage
  • Network problems
  • idle System
  • Operating System (WLM, VM)

33
High I/O utilization
  • Linux/390
  • sar -d 3 33333
  • iostat -x 3
  • vmstat 3
  • Z/OS
  • RMF
  • Omegamon etc

34
High I/O utilization
  • Disk I/O
  • Disk access is slower than memory access (Factor
    5000 to 100000)
  • One physical disk is able to perform 100-150
    I/Os per Second
  • Disk Reponse Times (Read operations)
  • 2ms (Read from disk cache)
  • 10ms 15ms (Physical Reads)

35
High I/O utilization
  • Reasons for High I/O utilization
  • Database Cache too small (DB_CACHE_SIZE)
  • Sortarea too small (sort_area_size)
  • Hasharea too small (hash_area_size)
  • Too many Checkpoints
  • Ineffective Execution Plans (e.g.
    Full-Table-Scans which are not necessary)

36
High I/O utilization
  • Increase Cache Size
  • Reduces physical I/O Operations
  • Z/OS
  • Limited by 31 Bit Arcitecture
  • Multiple Adress Spaces help to improve the Memory
    management

37
High I/O utilization
  • An Oracle server instance has a single SGA
    regardless of the number of address spaces
    or regions configured.
  • The user context is distributed across all AS

38
High I/O utilization
  • Linux/390
  • The default maximum SGA size on Linux/390 is 750
    MB without changing the base adress
  • the maximum SGA size to 1 GB by changing the SGA
    base address

39
High I/O utilization
Top 5 Timed Events

Total Event
Waits Time (s)
Wt Time ------------------------------------------
-- ------------ ------------ ------- db file
sequential read 89,086,819
11,009 93.13 db file scattered read
9,875,076 776
6.56 file open
505,227 23 .19 log file sync
440,409
8 .07 latch free
11,042,510 3 .03
40
High I/O utilization
Tablespace I/O Stats
Tablespace Av Av Av
Av Buffer Av Buf
Reads Reads/s Rd(ms) Blks/Rd Writes
Writes/s Waits Wt(ms) -------------- -------
------ ------- ------------ -------- ----------
------ RECEIVABLE_T_01 18,398,460 213
12.0 1.6 59,325 1 4,892,686
0.0 SO_T_03 6,827,475 79 13.2 1.6
27,462 0 4,506 0.0 SO_I_01
5,356,393 62 9.0 1.3 18,388
0 35,935 0.0 PO_I_01 4,641,732
1400 21.7 1.8 72,563 1
217,799 0.0
41
High I/O utilization
RMF Report (Monitor 1 RMF Postprocessor)
D I R E C T
A C C E S S D E V I C E A C T I V I T Y
DEVICE AVG
AVG AVG AVG AVG AVG AVG AVG
AVG STORAGE DEV DEVICE
VOLUME LCU ACTIVITY RESP IOSQ DPB CUB DB
PEND DISC CONN DEV DEV DEV NUMBER ANY
MT GROUP NUM TYPE SERIAL RATE
TIME TIME DLY DLY DLY TIME TIME TIME CONN
UTIL RESV ALLOC ALLOC PEND DBORACLE 7651
33903 LEOR00 008F 0.817 4 0 0.0
0.0 0.0 0.2 2.6 0.8 0.06 0.28 0.0
1.0 100.0 0.0 DBORACLE 7652 33903 LEOR01
008F 0.878 9 0 0.0 0.0 0.0 0.2
0.3 8.7 0.76 0.79 0.0 3.0 100.0 0.0
DBORACLE 7653 33903 LEOR02 008F 0.502 2
0 0.0 0.0 0.0 0.2 0.0 1.5 0.08 0.08
0.0 6.0 100.0 0.0 DBORACLE 7654 33903
LEOR03 008F 108.968 56 52 0.0 0.0 0.0
0.2 2.4 0.8 0.08 0.32 0.0 1.0 100.0
0.0 DBORACLE 7655 33903 LEOR04 008F 0.828
3 0 0.0 0.0 0.0 0.2 2.3 0.8 0.06
0.25 0.0 1.0 100.0 0.0 DBORACLE 7656
33903 LEOR05 008F 98.779 50 48 0.0
0.0 0.0 0.2 1.7 0.8 0.13 0.42 0.0
1.0 100.0 0.0 DBORACLE 7657 33903 LEOR06
008F 2.768 2 0 0.0 0.0 0.0 0.3
1.3 0.7 0.20 0.56 0.0 1.0 100.0 0.0
DBORACLE 7658 33903 LEOR07 008F 0.943 3
0 0.0 0.0 0.0 0.2 2.3 0.7 0.07 0.28
0.0 1.0 100.0 0.0 DBORACLE 7659 33903
LEOR08 008F 1.003 4 0 0.0 0.0 0.0
0.2 3.5 0.8 0.08 0.43 0.0 1.0 100.0
0.0 DBORACLE 765A 33903 LEOR09 008F 0.945
3 0 0.0 0.0 0.0 0.2 2.2 0.8 0.07
0.28 0.0 1.0 100.0 0.0 DBORACLE 765B
33903 LEOR0A 008F 0.217 3 0 0.0
0.0 0.0 0.2 2.2 0.8 0.02 0.06 0.0
1.0 100.0 0.0 DBORACLE 765C 33903 LEOR0B
008F 0.833 4 0 0.0 0.0 0.0 0.2
2.5 0.8 0.06 0.28 0.0 2.0 100.0 0.0
DBORACLE 765D 33903 LEOR0C 008F 0.963 4
0 0.0 0.0 0.0 0.2 2.7 0.9 0.09 0.35
0.0 1.0 100.0 0.0 DBORACLE 765E 33903
LEOR0D 008F 0.013 3 0 0.0 0.0 0.0
0.2 2.6 0.5 0.00 0.00 0.0 1.0 100.0
0.0 DBORACLE 765F 33903 LEOR0E 008F 0.935
4 0 0.0 0.0 0.0 0.2 3.0 0.8 0.07
0.35 0.0 1.0 100.0 0.0
42
High I/O utilization
  • RMF Report Explanations
  • IOSQ TIME UCB Queueing time
  • Avg Pend Time ms, all Paths to logical volume
    are busy
  • AVG Resp Time Connect Time Dicsonnect Time
    Pending Time IOSQ

43
SQL Tuning
  • Check Object Statsitics
  • Use DBMS_STATS
  • Analyze Execution Plan
  • Explain Query
  • Optimize Query
  • Optimize Indexes
  • Index Only Access, Function Based Indexes

44
Diagnose
  • High CPU Usage
  • High I/O utilization
  • Memory Usage
  • Network problems
  • idle System
  • Operating System (WLM, VM)

45
Memory Problems
  • How to determine Paging/Swapping
  • Linux/390
  • VMSTAT
  • Z/OS
  • RMF
  • OMEGAMON
  • Reasons for Paging/Swapping
  • Too many processes/users
  • Database Parameters which are too generously
  • DB_CACHE_SIZE
  • HASH_SIZE
  • SQL_CACHE

46
Diagnosing high CPU usage-Operating System-
  • High CPU Usage
  • High I/O utilization
  • Memory Usage
  • Network problems
  • idle System
  • Operating System (WLM, VM)

47
Diagnosing Network problems
  • Latency
  • LAN lt 1ms
  • WAN lt 10ms - 500ms
  • ISDN lt 50ms
  • VPN 100-500 ms
  • Badwidth
  • 11-18 Mbit (Copper)
  • 100 Mbit (Copper, fibre)
  • 1 Gbit (fibre)
  • Great number of small packets
  • tcp_nodelay
  • SDU, TDU-Parameters (not available on z/os)

48
Diagnosing high CPU usage-Operating System-
  • High CPU Usage
  • High I/O utilization
  • Memory Usage
  • Network problems
  • idle System
  • Operating System (WLM, VM)

49
Idle System
  • One CPU is 100 used All other CPUs are idle
  • Reason
  • dedicated Server
  • Only one process is running
  • Solution
  • Parallel Query
  • Not useful for OLTP Aplications
  • Split work - run more Processes

50
Idle System
  • Latch Contentions
  • Use Statspack to diagnose
  • Enqueue Waits
  • Use Statspack to diagnose
  • Often Block Contentions because of too small
    initrans, Freelist, Freelist goup settings
  • Parsing because the use of Literals
  • Use Statspack to diagnose
  • Use CURSOR SHARING
  • Use Bind Variables

51
Idle System
Top 5 Timed Events

Total Event
Waits Time (s) Ela Time ---------------
----------------------------- ------------
----------- -------- enqueue 1,560,523
78,588 50.24 CPU time
59,961 38.33 db file
sequential read 1,635,253
6,324 4.04 db file scattered read
14,620,725 5,907
3.78 control file parallel write
32,816 1,396 .89
------------------------------------------------
-------------
52
Idle System
Enqueue activity for DB RECONPRD Instance
RECONPRD Snaps 2 -31 -gt Enqueue stats gathered
prior to 9i should not be compared with 9i
data -gt ordered by Wait Time desc, Waits desc

Avg Wt Wait Eq Requests Succ
Gets Failed Gets Waits Time (ms) Time
(s) -- ------------ ------------ -----------
----------- ------------- ------------ TX
438,961 438,941 20 114
512,902.49 58,471 TC 34,530
34,530 0 6,904 369.61
2,552 PS 9,526,323 9,386,524 139,799
1,517,315 .25 381 CF
42,761 42,751 10 23
897.57 21 CI 55,594 55,594
0 12 6.17
0 HW 11,356 11,356 0
8 .13 0
----------------------------------------------
---------------
53
SQL-Tuning
  • Prerequisites
  • Use Cost based optimizer
  • DBMS_STATS (important)
  • Explain Query
  • Create Plan Table UTLXPLAN
  • Visualize Execution Plan
  • UTLXPLS
  • UTLXPLP
  • Note Scripts are located in
    xxxxxxxx.yyyyyyyy.SQL library (z/OS)
  • ORACLE_HOME/rdbms/admin (Linux/Unix)

54
SQL-Tuning
  • SQLgt explain plan for select a. from scott.emp
    a, scott.dept b where a.deptnob.deptno
  • Explained.
  • SQLgt save explain
  • Created file explain.sql
  • SQLgt _at_?/rdbms/admin/utlxpls
  • PLAN_TABLE_OUTPUT
  • --------------------------------------------------
    ------------------------------
  • --------------------------------------------------
    ------------------
  • Id Operation Name Rows
    Bytes Cost
  • --------------------------------------------------
    ------------------
  • 0 SELECT STATEMENT
    14 560 2
  • 1 NESTED LOOPS
    14 560 2
  • 2 TABLE ACCESS FULL EMP
    14 518 2
  • 3 INDEX UNIQUE SCAN PK_DEPT
    1 3
  • --------------------------------------------------
    ------------------

55
SQL-Tuning
  • Optimizer features which help to improve
    execution plans
  • Function based indexes (very important)
  • SELECT From emp where upper(ename) SMITH
  • Bitmap indexes (Useful in case of Read Only)
  • Useful for Low Cardinality columns
  • Parameter Optimizer_index_cost_adj
  • Optimizer access path selection can be adjusted
    to be more index friendly

56
SQL-Tuning
  • SQLTRACE
  • Prerequisite timed_statisticstrue
  • Activate
  • Alter Session set SQL_tracetrue
  • dbms_system.set_sql_trace_in_session
  • Use TKPROF to show execution statistics
  • sysno,explainuid/pw

57
z/OS WLM
  • Everything works fine without peaks (e.g.CPU 30)
  • Common Problems we had with WLM(during peak
    periods)
  • The Everything is important syndrom
  • User didnt classify any discretionary goals
  • Everything had the same importance
  • Enclave(Sess) with response time goals
  • Enclave goes to last period (which was
    discretionary) shortly after Logon
  • No default service class for OSDI
  • Mistake in classification rules will result in
    SYSOTHER being used discretionary goal

58
Oracle for Linux /390
  • We had tuning work
  • Linux on an LPAR
  • Linux under VM
  • We did not have any VM related problems
  • The reasons for performance bottlenecks were
  • Execution plan of a few SQL Queries
  • I/O Subsystem

59
Oracle for z/OS
  • The reasons for performance bottlenecks were
  • WLM configuration
  • Execution plan of a few SQL Queries
  • I/O Subsystem
  • Variances in disc response time

60
?
Write a Comment
User Comments (0)
About PowerShow.com