More Examples of Interpreting Wait Events To Boost System Performance - PowerPoint PPT Presentation

1 / 122
About This Presentation
Title:

More Examples of Interpreting Wait Events To Boost System Performance

Description:

More Examples of Interpreting Wait Events To Boost System Performance ... The instance collects cumulative statistics about events waited upon since instance startup. ... – PowerPoint PPT presentation

Number of Views:169
Avg rating:3.0/5.0
Slides: 123
Provided by: rogerschra
Category:

less

Transcript and Presenter's Notes

Title: More Examples of Interpreting Wait Events To Boost System Performance


1
More Examples of Interpreting Wait Events To
Boost System Performance
Roger Schrag and Terry Sutton Database
Specialists, Inc. www.dbspecialists.com
2
Session Objectives
  • Briefly introduce wait events
  • Define wait events
  • Discuss how to use the wait event interface
  • Walk through five examples of how wait event
    information was used to troubleshoot production
    problems

3
Wait Event Defined
  • We say an Oracle process is busy when it wants
    CPU time.
  • When an Oracle process is not busy, it is waiting
    for something to happen.
  • There are only so many things an Oracle process
    could be waiting for, and the kernel developers
    at Oracle have attached names to them all.
  • These are wait events.

4
Wait Event Examples
  • An Oracle process waiting for the client
    application to submit a SQL statement waits on a
    SQLNet message from client event.
  • An Oracle process waiting on another session to
    release a row-level lock waits on an enqueue
    event.

5
Wait Event Interface
  • Each Oracle process identifies the event it is
    waiting for each time a wait begins.
  • The instance collects cumulative statistics about
    events waited upon since instance startup.
  • You can access this information through v views
    and a wait event tracing facility.
  • These make up the wait event interface.

6
Viewing Wait Events
http//dbrx.dbspecialists.com/pls/dbrx/view_report
7
Why Wait Event Information Is Useful
  • Wait events touch all areas of Oraclefrom I/O to
    latches to parallelism to network traffic.
  • Wait event data can be remarkably detailed.
    Waited 0.02 seconds to read 8 blocks from file
    42 starting at block 18042.
  • Analyzing wait event data will yield a path
    toward a solution for almost any problem.

8
Important Wait Events
  • There were 158 wait events in Oracle 8.0.
  • There are 363 wait events in Oracle 9i Release 2
    (9.2.0).
  • Most come up infrequently or are rarely
    significant for troubleshooting performance.
  • Different wait events are significant in
    different environments, depending on which Oracle
    features have been deployed.

9
A Few Common Events
buffer busy waits library cache load lock
control file parallel write library cache pin
control file sequential read log buffer space
db file parallel read / write log file parallel
write db file scattered read log file
sequential read db file sequential read log
file switch completion direct path read /
write log file sync enqueue undo segment
extension free buffer waits write complete
waits latch free
10
Idle Events
  • Sometimes an Oracle process is not busy simply
    because it has nothing to do.
  • In this case the process will be waiting on an
    event that we call an idle event.
  • Idle events are usually not interesting from the
    tuning and troubleshooting perspective.

11
Common Idle Events
client message PX Deq Execute
Reply dispatcher timer PX Deq Execution
Msg gcs for action PX Deq Signal ACK gcs
remote message PX Deq Table Q Normal ges
remote message PX Deque wait i/o slave
wait PX Idle Wait jobq slave wait queue
messages lock manager wait for remote
message rdbms ipc message null event slave
wait parallel query dequeue smon timer pipe
get SQLNet message from client PL/SQL lock
timer SQLNet message to client pmon
timer SQLNet more data from client PX Deq
Credit need buffer virtual circuit status PX
Deq Credit send blkd wakeup time manager
12
Accounted for by the Wait Event Interface
  • Time spent waiting for something to do (idle
    events)
  • Time spent waiting for something to happen so
    that work may continue (non-idle events)

13
Not Accounted for by the Wait Event Interface
  • Time spent using a CPU
  • Time spent waiting for a CPU
  • Time spent waiting for virtual memory to be
    swapped back into physical memory
  • Time spent on CPU-intensive activities
  • Logical reads
  • Spinning while waiting for latches
  • Statement parsing

14
Timed Statistics
The wait event interface will not collect timing
information unless timed statistics are enabled.
  • Enable timed statistics dynamically at the
    instance or session level
  • ALTER SYSTEM SET timed_statistics TRUE
  • ALTER SESSION SET timed_statistics TRUE
  • Enable timed statistics at instance startup by
    setting the instance parameter
  • timed_statistics true

15
The Wait Event Interface
  • Dynamic performance views
  • vsystem_event
  • vsession_event
  • vevent_name
  • vsession_wait
  • Wait event tracing

16
The vsystem_event View
  • Shows one row for each wait event name, along
    with cumulative statistics since instance
    startup. Wait events that have not occurred at
    least once since instance startup do not appear
    in this view.
  • Column Name Data Type
  • -------------------------- ------------
  • EVENT VARCHAR2(64)
  • TOTAL_WAITS NUMBER
  • TOTAL_TIMEOUTS NUMBER
  • TIME_WAITED NUMBER
  • AVERAGE_WAIT NUMBER
  • TIME_WAITED_MICRO NUMBER

17
Columns In vsystem_event
  • EVENT The name of a wait event
  • TOTAL_WAITS Total number of times a process has
    waited for this event since instance startup
  • TOTAL_TIMEOUTS Total number of timeouts while
    waiting for this event since instance startup
  • TIME_WAITED Total time waited for this wait
    event by all processes since startup (in
    centiseconds)
  • AVERAGE_WAIT The average length of a wait for
    this event since instance startup (in
    centiseconds)
  • TIME_WAITED_MICRO Same as TIME_WAITED but in
    microseconds (Oracle 9i)

18
Sample vsystem_event Query
  • SQLgt SELECT event, time_waited
  • 2 FROM vsystem_event
  • 3 WHERE event IN ('smon timer',
  • 4 'SQLNet message from client',
  • 5 'db file sequential read',
  • 6 'log file parallel write')
  • EVENT TIME_WAITED
  • --------------------------------- -----------
  • log file parallel write 159692
  • db file sequential read 28657
  • smon timer 130673837
  • SQLNet message from client 16528989

19
The vsession_event View
  • Shows one row for each wait event name within
    each session, along with cumulative statistics
    since session start.
  • Column Name Data Type
  • -------------------------- ------------
  • SID NUMBER
  • EVENT VARCHAR2(64)
  • TOTAL_WAITS NUMBER
  • TOTAL_TIMEOUTS NUMBER
  • TIME_WAITED NUMBER
  • AVERAGE_WAIT NUMBER
  • MAX_WAIT NUMBER
  • TIME_WAITED_MICRO NUMBER

20
Columns In vsession_event
  • SID The ID of a session (from vsession)
  • EVENT The name of a wait event
  • TOTAL_WAITS Total number of times this session
    has waited for this event
  • TOTAL_TIMEOUTS Total number of timeouts while
    this session has waited for this event
  • TIME_WAITED Total time waited for this event by
    this session (in centiseconds)
  • AVERAGE_WAIT The average length of a wait for
    this event in this session (in centiseconds)
  • MAX_WAIT The maximum amount of time the session
    had to wait for this event (in centiseconds)

21
Sample vsession_event Query
  • SQLgt SELECT event, total_waits, time_waited_micro
  • 2 FROM vsession_event
  • 3 WHERE SID
  • 4 (SELECT sid FROM vsession
  • 5 WHERE audsid
  • 6 USERENV ('sessionid') )
  •  
  • EVENT WAITS
    TIME_WAITED_MICRO
  • --------------------------- -----
    -----------------
  • db file sequential read 552
    2409173
  • db file scattered read 41
    315928
  • SQLNet message to client 73
    347
  • SQLNet message from client 72
    3397382712

22
Oracle 9i Bug 2429929
  • SQLgt SELECT event, total_waits, time_waited_micro
  • 2 FROM vsession_event
  • 3 WHERE SID 1
  • 4 (SELECT sid FROM vsession
  • 5 WHERE audsid
  • 6 USERENV ('sessionid') )
  •  
  • EVENT WAITS
    TIME_WAITED_MICRO
  • --------------------------- -----
    -----------------
  • db file sequential read 552
    2409173
  • db file scattered read 41
    315928
  • SQLNet message to client 73
    347
  • SQLNet message from client 72
    3397382712

23
The vevent_name View
  • Shows one row for each wait event name known to
    the Oracle kernel, along with names of up to
    three parameters associated with the wait event.
  • Column Name Data Type
  • -------------------------- ------------
  • EVENT NUMBER
  • NAME VARCHAR2(64)
  • PARAMETER1 VARCHAR2(64)
  • PARAMETER2 VARCHAR2(64)
  • PARAMETER3 VARCHAR2(64)

24
Columns In vevent_name
  • EVENT An internal ID
  • NAME The name of a wait event
  • PARAMETERn The name of a parameter associated
    with the wait event

25
Sample vevent_name Query
  • SQLgt SELECT
  • 2 FROM vevent_name
  • 3 WHERE name 'db file scattered read'
  •  
  • EVENT NAME
  • ---------- ------------------------------
  • PARAMETER1 PARAMETER2 PARAMETER3
  • ------------- ------------- -------------
  • 95 db file scattered read
  • file block blocks

26
The vsession_wait View
  • Shows one row for each session, providing
    detailed information about the current or most
    recent wait event.
  • Column Name Data Type
  • -------------------------- ------------
  • SID NUMBER
  • SEQ NUMBER
  • EVENT VARCHAR2(64)
  • P1TEXT VARCHAR2(64)
  • P1 NUMBER
  • P1RAW RAW(4)
  • P2TEXT VARCHAR2(64)
  • P2 NUMBER
  • P2RAW RAW(4)
  • P3TEXT VARCHAR2(64)
  • P3 NUMBER
  • P3RAW RAW(4)
  • WAIT_TIME NUMBER
  • SECONDS_IN_WAIT NUMBER
  • STATE VARCHAR2(19)

27
Columns In vsession_wait
  • SID The ID of a session
  • SEQ A number that increments by one on each new
    wait
  • STATE An indicator of the session status
  • WAITING The session is currently waiting, and
    details of the wait event are provided.
  • WAITED KNOWN TIME The session is not waiting,
    but information about the most recent wait is
    provided.
  • WAITED SHORT TIME or WAITED UNKNOWN TIME The
    session is not waiting, but partial information
    about the most recent wait is provided.

28
Columns In vsession_wait (cont.)
  • EVENT The name of a wait event
  • PnTEXT The name of a parameter associated with
    the wait event
  • Pn The value of the parameter in decimal form
  • PnRAW The value of the parameter in raw form
  • WAIT_TIME Length of most recent wait (in
    centiseconds) if STATE WAITED KNOWN TIME
  • SECONDS_IN_WAIT How long current wait has been
    so far if STATE WAITING

29
Sample vsession_wait Query
  • SQLgt SELECT FROM vsession_wait WHERE sid 16
  •  
  • SID SEQ EVENT
  • ---- ----- ------------------------------
  • P1TEXT P1 P1RAW P2TEXT P2 P2RAW
  • ------ ---- -------- ------ ---- --------
  • P3TEXT P3 P3RAW WAIT_TIME SECONDS_IN_WAIT
  • ------ ---- -------- --------- ---------------
  • STATE
  • -------------------
  • 16 303 db file scattered read
  • file 17 00000011 block 2721 00000AA1
  • blocks 8 00000008 -1 0
  • WAITED SHORT TIME

30
Tracing Wait Event Activity
Using the dbms_support package or setting debug
event 10046 enables SQL trace, and can optionally
include wait event information and bind variable
data in trace files as well.
  • Methods for setting debug events
  • ALTER SESSION SET events
  • oradebug
  • dbms_system.set_ev

31
Activating Wait Event Tracing
  • dbms_support is missing from many releases of
    Oracle 8i, but is available as a patch.
  • dbms_support is not installed by default run
    dbmssupp.sql in ?/rdbms/admin to install it.
  • dbms_system.set_ev is not supported by Oracle
    Corporation because it lets you set any debug
    event and some can put your database at risk.
  • Tracing imposes serious system overhead, so trace
    only what you need.

32
Debug Event 10046 Settings
  • ALTER SESSION SET events
  • '10046 trace name context forever, level N'

33
Sample Oracle 8i Trace Output
  • PARSING IN CURSOR 1 len80 dep0 uid502 oct3
    lid502
  • tim2293771931 hv2293373707 ad'511dca20'
  • SELECT / FULL / SUM (LENGTH(notes))
  • FROM customer_calls
  • WHERE status x
  • END OF STMT
  • PARSE 1c0,e0,p0,cr0,cu0,mis1,r0,dep0,og
    0,tim2293771931
  • BINDS 1
  • bind 0 dty2 mxl22(22) mal00 scl00 pre00
    oacflg03 oacfl20
  • size24 offset0
  • bfp09717724 bln22 avl02 flg05
  • value43
  • EXEC 1c0,e0,p0,cr0,cu0,mis0,r0,dep0,og4
    ,tim2293771931
  • WAIT 1 nam'SQLNet message to client' ela 0
    p1675562835 p21 p30
  • WAIT 1 nam'db file scattered read' ela 3
    p117 p2923 p38
  • WAIT 1 nam'db file scattered read' ela 1
    p117 p2931 p38
  • WAIT 1 nam'db file scattered read' ela 2
    p117 p2939 p38
  • WAIT 1 nam'db file sequential read' ela 0
    p117 p2947 p31

34
Wait Event Tracing Enhancements In Oracle 9i
  • The dbms_support package is provided for easier
    trace activation.
  • Elapsed times in the trace file are shown in
    microseconds instead of centiseconds.
  • A waitsyes option has been added to TKPROF to
    include wait event statistics in the TKPROF
    report.

35
Using Wait Event Information
  • Five examples of how wait event information was
    used to diagnose production problems

36
Example 1 Buffer Busy Waits
  • A magazine publisher has a website that displays
    content stored in a database. At times the
    website would get bogged downresponse time would
    become poor and the database server would become
    extremely busy (near-zero idle time).

37
Viewing Wait Events Statistics With Statspack
  • Collect Statspack snapshots at regular intervals.
  • Statspack report shows top wait events for entire
    instance during snapshot interval.
  • Oracle 9i Statspack also shows CPU time used
    during the interval.

38
Statspack Report Output
  • Snap Id Snap Time Sessions
  • -------
    ------------------ --------
  • Begin Snap 61 11-Dec-02
    130052 145
  • End Snap 71 11-Dec-02
    140026 145
  • - - - - - - - - - - - - - - - - - - - - - -
    - - - - - - - -
  • Top 5 Wait Events

  • Wait Total
  • Event Waits Time
    (cs) Wt Time
  • ------------------------ ----------
    ---------- -------
  • buffer busy waits 1,962,372
    1,278,649 50.03
  • db file sequential read 1,336,870
    1,050,878 41.12
  • db file scattered read 47,717
    49,326 1.93
  • direct path write 8,070
    40,574 1.59
  • latch free 38,220
    31,012 1.21

39
What We See in the Statspack Report
  • Dominant wait events
  • buffer busy waits
  • db file sequential read
  • Over 23,000 seconds of wait time on these two
    events in a one hour period (over 6 seconds of
    waiting per elapsed second)

40
Understanding the Buffer Busy Waits Event
  • SQLgt SELECT parameter1, parameter2, parameter3
  • 2 FROM vevent_name
  • 3 WHERE name 'buffer busy waits'
  •  
  • PARAMETER1 PARAMETER2 PARAMETER3
  • ------------ ------------ ------------
  • file block id
  • file Data file containing the desired data
    block
  • block Block within the data file that is
    desired
  • id Reason the buffer in the buffer cache is busy
    (see Metalink bulletin 34405.1)

41
Finding Which Data Blocks Are Experiencing Buffer
Contention
  • SQLgt SELECT sid, event, state, seconds_in_wait,
  • 2 wait_time, p1, p2, p3
  • 3 FROM vsession_wait
  • 4 WHERE event 'buffer busy waits'
  • 5 ORDER BY sid
  •  
  • SID EVENT STATE SEC TIME P1 P2
    P3
  • --- ----------------- ----- --- ---- ----- -----
    -----
  • 12 buffer busy waits WAITE 1 0 30 62157
    130
  • 31 buffer busy waits WAITE 1 0 30 23558
    130

42
Finding Which Data Blocks Are Experiencing Buffer
Contention
  • SQLgt SELECT owner, segment_name, segment_type
  • 2 FROM dba_extents
  • 3 WHERE file_id absolute_file_number
  • 4 AND block_number BETWEEN block_id
  • 5 AND block_id blocks
    -1
  • Enter value for absolute_file_number 30
  • Enter value for block_number 62157
  • OWNER SEGMENT_NAME
    SEGMENT_TYPE
  • ----------------- -------------------
    ------------
  • PRODMGR SAMPLES TABLE

43
Reason Codes from Metalink Bulletin 34405.1
44
What We Have Learned So Far
  • A buffer containing a data block of the SAMPLES
    table is experiencing contention.
  • The buffer in the buffer cache is busy because
    another session is reading the same data block
    from disk.

45
Understanding the DB File Sequential Read Event
  • SQLgt SELECT parameter1, parameter2, parameter3
  • 2 FROM vevent_name
  • 3 WHERE name 'db file sequential read'
  •  
  • PARAMETER1 PARAMETER2 PARAMETER3
  • ------------ ------------ ------------
  • file block blocks
  • file Data file containing the desired data
    block
  • block Block within the data file that is
    desired
  • blocks How many blocks are being read (typically
    1 for db file sequential read)

46
Finding Which Data Blocks Are Being Read
  • SQLgt SELECT sid, event, state, seconds_in_wait,
  • 2 wait_time, p1, p2, p3
  • 3 FROM vsession_wait
  • 4 WHERE event 'db file sequential read'
  • 5 ORDER BY sid
  • SID EVENT STATE SEC TIME P1 P2
    P3
  • --- ----------------- ----- --- ---- ----- -----
    -----
  • 17 db file sequentia WAITE 1 0 30 62042
    1
  • 19 db file sequentia WAITE 1 0 30 61731
    1
  • 33 db file sequentia WAITI 0 0 30 57292
    1

47
Finding Which Data Blocks Are Being Read
  • SQLgt SELECT owner, segment_name, segment_type
  • 2 FROM dba_extents
  • 3 WHERE file_id absolute_file_number
  • 4 AND block_number BETWEEN block_id
  • 5 AND block_id blocks
    -1
  • Enter value for absolute_file_number 30
  • Enter value for block_number 62042
  • OWNER SEGMENT_NAME
    SEGMENT_TYPE
  • ----------------- -------------------
    ------------
  • PRODMGR SAMPLES TABLE

48
The SAMPLES Table
  • Contained a LONG column with very large values
  • Excessive row chaining
  • Most queries did not retrieve the LONG data
  • Table assigned to KEEP pool, but too large to fit
    entirely in memory

49
Long-Term Problem Resolution
  • Convert the LONG column to a CLOB.
  • Large CLOB data will be stored in a separate LOB
    segment.
  • Row chaining will be reduced or eliminated.
  • The table segment will be much smaller and more
    likely to fit in memory.

50
Short-Term Problem Resolution
  • Added index on most columns of SAMPLES table
  • Allowed most queries to avoid table segment
  • Enlarged KEEP pool
  • Allowed index segment to fit in memory

51
Statspack Report Output
  • Snap Id Snap Time Sessions
  • -------
    ------------------ --------
  • Begin Snap 1192 20-Dec-02
    130049 102
  • End Snap 1202 20-Dec-02
    140018 102
  • - - - - - - - - - - - - - - - - - - - - - -
    - - - - - - - -
  • Top 5 Wait Events

  • Wait Total
  • Event Waits Time
    (cs) Wt Time
  • ------------------------ ----------
    ---------- -------
  • direct path write 6,467
    13,545 30.61
  • log file sync 4,914
    7,493 16.93
  • library cache pin 1,175
    6,090 13.76
  • direct path read 5,488
    3,428 7.75
  • latch free 14,528
    2,931 6.62

52
What We See in the Statspack Report Now
  • No db file sequential read or buffer busy waits.
  • All data was already in the buffer cache.
  • Physical reads reduced by over 90.
  • Total wait time on all non-idle events reduced by
    over 98.
  • Before 12,786.49 / 0.5003 25,557.65
  • After 135.45 / 0.3061 442.50

53
What We Learned from Wait Event Information 
  • Large amounts of time were being spent waiting on
    single block disk reads and buffer contention in
    the buffer cache.
  • Random samples showed the disk reads and
    contention involved the SAMPLES table.
  • The buffer contention was the result of multiple
    sessions needing the same block from disk.
  • Wait events pointed us directly to the problem
    segment.

54
Example 2 More Buffer Busy Waits, Plus Latch
Contention
  • A genetic research company stored their data in
    Oracle. Applications running concurrently on many
    workstations would fetch raw data, process it,
    and put the data back in the database. But
    throughput bogged down as they added more
    workstations.

55
Activating Wait Event Tracing
  • Added to application code on workstation 30
  • ALTER SESSION SET events
  • '10046 trace name context forever, level 8'
  • Could have used dbms_support if it was installed
  • dbms_support.start_trace
  • Modified application code to exit after 500
    iterations

56
TKPROF Wait Events Reporting in Oracle 9i
  • tkprof prodgen_ora_16466.trc report_16466.prf
    waitsyes

57
TKPROF Report Output
  • UPDATE processing_stations
  • SET status 'ACTIVE', status_date SYSDATE,
  • data_set_id_being_processed b1
  • WHERE station_id 30
  • call count cpu elapsed disk query
    current rows
  • ------- ------ ------- --------- ----- -----
    ------- -----
  • Parse 1 0.01 0.00 0 0
    0 0
  • Execute 500 0.23 10.14 0 3616
    1010 500
  • Fetch 0 0.00 0.00 0 0
    0 0
  • ------- ------ ------- --------- ----- -----
    ------- -----
  • total 501 0.24 10.14 0 3616
    1010 500
  • Elapsed times include waiting on following
    events
  • Event waited on Times Max. Wait
    Total Waited
  • --------------------------- Waited ----------
    ------------
  • buffer busy waits 26 0.71
    7.87
  • latch free 17 0.57
    2.08
  • log file switch completion 3 0.09
    0.20

58
What We See In the TKPROF Report
  • 500 trivial updates took 10.14 seconds
  • Most of that time was spent waiting
  • Dominant wait events
  • buffer busy waits
  • latch free
  • CPU time plus wait time does not add up to
    elapsed time due to round-off errors
  •  

59
Waits In the Trace File
  • WAIT 2 nam'buffer busy waits' ela 527727
    p118 p210 p3220
  • WAIT 2 nam'buffer busy waits' ela 498765
    p118 p210 p3220
  • WAIT 2 nam'buffer busy waits' ela 137611
    p118 p210 p3220
  • WAIT 2 nam'buffer busy waits' ela 124165
    p118 p210 p3220
  • WAIT 2 nam'buffer busy waits' ela 5237 p118
    p210 p3220
  • WAIT 2 nam'buffer busy waits' ela 264050
    p118 p210 p3220
  • WAIT 2 nam'buffer busy waits' ela 270177
    p118 p210 p3220
  • WAIT 2 nam'buffer busy waits' ela 330912
    p118 p210 p3220
  • WAIT 2 nam'buffer busy waits' ela 156317
    p118 p210 p3220
  • WAIT 2 nam'buffer busy waits' ela 710696
    p118 p210 p3220
  • Elapsed times are in microseconds in Oracle 9i

60
Finding Which Data Blocks Are Experiencing Buffer
Contention
  • SQLgt SELECT owner, segment_name, segment_type
  • 2 FROM dba_extents
  • 3 WHERE file_id absolute_file_number
  • 4 AND block_number BETWEEN block_id
  • 5 AND block_id blocks
    -1
  • Enter value for absolute_file_number 18
  • Enter value for block_number 10
  • OWNER SEGMENT_NAME
    SEGMENT_TYPE
  • ----------------- -------------------
    ------------
  • GEN PROCESSING_STATIONS TABLE

61
Reason Codes from Metalink Bulletin 34405.1
62
What We Have Learned So Far
  • A buffer containing a data block of the
    PROCESSING_STATIONS table is experiencing
    contention.
  • The buffer in the buffer cache is busy because
    another session has the buffer in an incompatible
    mode.
  • All 26 buffer busy waits totaling 7.87 seconds
    involved the same data block.

63
The PROCESSING_STATIONS Table
  • SQLgt SELECT SYSDATE - last_analyzed, blocks,
  • 2 avg_row_len, avg_space, num_rows
  • 3 FROM user_tables
  • 4 WHERE table_name 'PROCESSING_STATIONS'
  • SYSDATE- AVG_ AVG_
  • LAST_ANALYZED BLOCKS ROW_LEN SPACE NUM_ROWS
  • ------------- ------ ------- ----- --------
  • 2.132118056 1 62 1686 100

64
Two Important Observations
  • There were 100 workstations running the
    processing application concurrently.
  • The trace we ran on workstation 30 completed in
    just under one minute.

65
Lots of Updates!
  • Workstation 30 updated the PROCESSING_STATIONS
    table 500 times in less than one minute.
  • If all 100 workstations do similar things More
    than 50,000 updates to one data block every
    minute by 100 concurrent sessions!

66
Why So Many Updates?
  • Workstations use the PROCESSING_STATIONS table to
    track which workstation is processing which data
    set.
  • Processing one data set takes between 0.1 second
    and 20 minutes.
  • Workstations update the table frequently to keep
    the timestamp current. This would be helpful in
    the event of a workstation crash.

67
Long-Term Problem Resolution
  • Modify the application to update the
    PROCESSING_STATIONS table less frequentlyonce
    per data set or once per second for larger data
    sets
  • Will reduce updates by over 80
  • Buffer busy waits will disappear or dramatically
    decrease

68
Short-Term Problem Resolution
  • Rebuilt the PROCESSING_STATIONS table with
    PCTFREE set to 99
  • Oracle reserved 99 of each data block for future
    row expansion.
  • Each row got its own data block.
  • Each workstation session now updates a separate
    data block.

69
The Rebuilt PROCESSING_STATIONS Table
  • SQLgt SELECT SYSDATE - last_analyzed, blocks,
  • 2 avg_row_len, avg_space, num_rows
  • 3 FROM user_tables
  • 4 WHERE table_name 'PROCESSING_STATIONS'
  • SYSDATE- AVG_ AVG_
  • LAST_ANALYZED BLOCKS ROW_LEN SPACE NUM_ROWS
  • ------------- ------ ------- ----- --------
  • .130868056 100 62 8014 100

70
TKPROF Report Output
  • UPDATE processing_stations
  • SET status 'ACTIVE', status_date SYSDATE,
  • data_set_id_being_processed b1
  • WHERE station_id 30
  • call count cpu elapsed disk query
    current rows
  • ------- ------ ------- --------- ----- -----
    ------- -----
  • Parse 1 0.00 0.00 0 0
    0 0
  • Execute 500 0.20 2.22 0 500
    1009 500
  • Fetch 0 0.00 0.00 0 0
    0 0
  • ------- ------ ------- --------- ----- -----
    ------- -----
  • total 501 0.20 2.22 0 500
    1009 500
  • Elapsed times include waiting on following
    events
  • Event waited on Times Max. Wait
    Total Waited
  • --------------------------- Waited ----------
    ------------
  • latch free 2 0.35
    0.61

71
What We See in the TKPROF Report Now
  • 500 updates took 2.22 seconds, down from 10.14
    seconds
  • No more buffer busy waits
  • Waited 0.61 seconds on latches, down from 2.08
    seconds
  • CPU time was 0.20 seconds, down from 0.23 seconds
  • 1.41 seconds unaccounted forlikely a mix of
    waiting for CPU and round-off error

72
Understanding the Latch Free Event
  • SQLgt SELECT parameter1, parameter2, parameter3
  • 2 FROM vevent_name
  • 3 WHERE name 'latch free'
  •  
  • PARAMETER1 PARAMETER2 PARAMETER3
  • ------------ ------------ ------------
  • address number tries
  •        
  • address Join to addr in vlatch      
  • number Join to latch in vlatchname
  • tries Number of times the session has waited
    while trying to acquire the latch

73
Waits In the Trace File
  • WAIT 2 nam'latch free' ela 47004
    p115113593728 p297 p30
  • WAIT 2 nam'latch free' ela 14629
    p115113593728 p297 p31
  • WAIT 2 nam'latch free' ela 20652
    p115113593728 p297 p32
  • WAIT 2 nam'latch free' ela 37737
    p115113593728 p297 p33
  •  
  • Four consecutive waits for one acquisition of the
    latch

74
Finding Which Latches Are Experiencing Contention
  • SQLgt SELECT latch, name
  • 2 FROM vlatchname
  • 3 WHERE latch latch_number
  •  
  • Enter value for latch_number 97
  •  
  • LATCH NAME
  • ---------- --------------------
  • 97 cache buffers chains

75
What We Learned from Wait Event Information
  • Much time was spent waiting on latch contention
    and buffer contention in the buffer cache.
  • The buffer contention was all for one data block.
  • The buffer contention was the result of multiple
    sessions needing to update the same data block.
  • The latch contention involved the latch that
    protects the buffer cache chains data structure.
  • Wait events pointed us directly to the hot buffer
    in the buffer cache.

76
Example 3 Log File Waits
  • A data warehouse loader application was tuned in
    a test environment until it met user acceptance.
    The production server was larger and more
    powerful, but the data loads actually took longer
    in production than in the test environment.

77
Summarizing Wait Events During A Period of Time
  • vsystem_event shows wait event totals since
    instance startup.
  • vsession_event shows wait event totals since the
    beginning of a session.
  • You can capture view contents at different points
    in time and compute the delta in order to get
    wait event information for a specific period of
    time.
  • Statspack and many third-party tools can do this,
    but a simple script has less overhead and can be
    quicker to deploy.

78
Simple Script to See Wait Events During a 30
Second Period
  • CREATE TABLE previous_events AS
  • SELECT SYSDATE timestamp, vsystem_event.
  • FROM vsystem_event
  • EXECUTE dbms_lock.sleep (30)
  • SELECT A.event,
  • A.total_waits
  • - NVL (B.total_waits, 0) total_waits,
  • A.time_waited
  • - NVL (B.time_waited, 0) time_waited
  • FROM vsystem_event A, previous_events B
  • WHERE A.event NOT IN (list of idle events)
  • AND B.event () A.event
  • ORDER BY time_waited

79
Wait Events During 30 Seconds of Data Loading
  • EVENT TOTAL_WAITS
    TIME_WAITED
  • ---------------------------- -----------
    -----------
  • control file sequential read 61
    1
  • latch free 2
    1
  • db file sequential read 6
    7
  • control file parallel write 41
    31
  • log file single write 6
    164
  • db file parallel write 13
    220
  • enqueue 6
    486
  • log buffer space 24
    2007
  • log file sequential read 30
    2655
  • log file switch completion 33
    2883
  • log file parallel write 19
    3561
  • log file sync 113
    10249

80
What We See in the Script Output
  • Over 215 seconds spent waiting on log-related
    events
  • Sessions waited 102 seconds for LGWR to flush the
    log buffer to disk for a commit
  • Sessions waited 48 seconds for LGWR to make space
    for more redo
  • LGWR waited 37 seconds for disk writes
  • ARCH waited 26 seconds for disk reads

81
Investigate the Redo Log
  • Check production online redo log location for
    contention and slow hardware
  • All log files located on one disk
  • Same disk held hot files for another database
  • Compare to test environment
  • Log files located on a striped volume
  • Minimal other activity on the volume
  • Database in NOARCHIVELOG mode

82
Problem Resolution
  • Sped up online redo log file performance
  • Moved log files to dedicated disks
  • Spread log files over multiple disks

83
What We Learned from Wait Event Information
  • The disk I/O speed writing and reading the online
    redo log files was the bottleneck slowing down
    the data warehouse load.
  • Wait events pointed us directly to the area
    within Oracle that was holding up the works.

84
Example 4 Direct Path I/O Waits
  • Analysts in a customer service unit were
    satisfied with the response time when they
    queried individual customer orders from their
    data warehouse. However, queries involving
    summarizations of multiple orders were
    unacceptably slow.

85
Database Rx Wait Event Report
86
What We See in the Database Rx Report
  • Dominant wait events
  • direct path write
  • db file scattered read
  • direct path read
  • Above account for 99 of non-idle event wait time
  • Insignificant db file sequential read waits

87
What We Have Learned So Far
  • Large amount of direct path I/O activity
  • Usually involves temporary segments
  • Significant multi-block I/O reads
  • Full table scans are common in a data warehouse
    environment
  • Insignificant single-block I/O reads
  • Frequently accessed data blocks probably in
    buffer cache

88
Understanding the Direct Path I/O Events
  • SQLgt SELECT name, parameter1, parameter2,
    parameter3
  • 2 FROM vevent_name
  • 3 WHERE name LIKE 'direct path'
  • NAME PARAMETER1 PARAMETER2
    PARAMETER3
  • ----------------- ----------- ----------
    ----------
  • direct path read file number first dba block
    cnt
  • direct path write file number first dba block
    cnt
  • file number File containing data block
  • first dba First block within the file to be
    accessed
  • block cnt Number of blocks to be accessed

89
Finding Which Files Are Being Accessed
  • SQLgt SELECT sid, event, state, seconds_in_wait,
  • 2 wait_time, p1, p2, p3
  • 3 FROM vsession_wait
  • 4 WHERE event 'direct path write'
  • 5 ORDER BY sid
  • SID EVENT STATE SEC TIME P1 P2
    P3
  • --- ----------------- ----- --- ---- ----- -----
    -----
  • 39 direct path write WAITI 0 0 201 65
    7
  • 47 direct path write WAITI 0 0 201 2248
    7

90
Finding Which Files Are Being Accessed
  • SQLgt SELECT tablespace_name, file_id "AFN"
  • 2 FROM dba_data_files
  • 3 WHERE file_id 201
  • no rows selected
  • SQLgt SELECT tablespace_name, file_id value
    "AFN"
  • 2 FROM dba_temp_files, vparameter
  • 3 WHERE name 'db_files'
  • 4 AND file_id value 201
  • TABLESPACE_NAME AFN
  • ------------------------------ ----------
  • TEMP 201

91
Problem Resolution
  • Increased sort_area_size
  • Was set to default of 65536
  • Increased to 10485760 (few concurrent sessions)
  • If that had not solved the problem
  • Tune application code to reduce sorting
  • Check for other active files on disks holding
    temp files
  • Move temp files to a striped volume

92
What We Learned from Wait Event Information
  • Direct path I/O accounted for 75 of the non-idle
    event wait time on the system.
  • Multi-block reads accounted for 24 of the
    non-idle event wait timenot unusual in a data
    warehouse environment.
  • Random samples showed direct path I/O involved
    the temporary tablespace.
  • Wait events pointed us directly to the area
    within Oracle that needed adjustment.

93
Logical vs. Physical Reads
94
Logical vs. Physical Reads
  • During the Database Rx sample interval there were
    more physical reads than logical reads.
  • Direct path reads count as physical reads but not
    logical reads.
  • Be careful how you compute your buffer cache hit
    ratiosin this example you might come up with a
    negative figure!

95
Example 5 Database Link Wait Events
  • A company had five Oracle databases, one per
    region. Due to human error, the same customer
    transactions would sometimes get loaded into
    multiple databases. A report was built to
    identify these duplicates, but it took 30 minutes
    to run.

96
Isolating a Query and Analyzing Its Wait Events
  • Start a new database session in SQLPlus or a
    similar tool.
  • Run the query.
  • Monitor the sessions wait events and statistics
    from another session
  • vsession_event
  • vsesstat
  • This is a handy technique when you know which
    statement is the bottleneck.

97
Query Output from vsession_event
  • SQLgt SELECT event, total_waits, time_waited,
    max_wait
  • 2 FROM vsession_event
  • 3 WHERE sid 47
  • 4 ORDER BY event
  • EVENT TOTAL_WAITS
    TIME_WAITED MAX_WAIT
  • --------------------------- -----------
    ----------- ----------
  • SQLNet message from client 32
    4435 2432
  • SQLNet message from dblink 1525516
    104919 31
  • SQLNet message to client 33
    0 0
  • SQLNet message to dblink 1525516
    466 9
  • db file sequential read 27199
    8025 28
  • latch free 40
    5 4
  • log file sync 1
    2 2

98
Query Output from vsesstat
  • SQLgt SELECT A.name, B.value
  • 2 FROM vstatname A, vsesstat B
  • 3 WHERE A.statistic 12
  • 4 AND B.statistic A.statistic
  • 5 AND B.sid 47
  • NAME VALUE
  • ------------------------------ ----------
  • CPU used by this session 67937

99
What We See In the v Data
  • 1.5 million waits on network roundtrips through a
    database link 1053 seconds
  • Network latency
  • Time for the remote database to respond to each
    request
  • 27,000 waits for single-block disk reads 80
    seconds

100
The Query We Are Studying
  • SELECT customer_id, batch_serial_number,
    batch_date,
  • load_date, batch_comment, control_total
  • FROM customer_xfer_batches A
  • WHERE exists
  • (SELECT 1
  • FROM customer_xfer_batches_at_prdwest B
  • WHERE B.customer_id A.customer_id
  • AND B.batch_serial_number
  • A.batch_serial_number)
  • ORDER BY customer_id, batch_serial_number

101
The Query We Are Studying
  • Execution Plan
  • --------------------------------------------------
    --------
  • 0 SELECT STATEMENT
  • 1 0 FILTER
  • 2 1 TABLE ACCESS (BY INDEX ROWID) OF
    'CUSTOMER_XFER_BATCHES'
  • 3 2 INDEX (FULL SCAN) OF
    'CUST_XFER_BAT_PK' (UNIQUE)
  • 4 1 REMOTE PRDWEST
  • 4 SERIAL_FROM_REMOTE SELECT "CUSTOMER_ID","BATCH_
    SERIAL_NUMBER"
  • FROM "CUSTOMER_XFER_BATCHES"
    "B"
  • WHERE "BATCH_SERIAL_NUMBER"
    1
  • AND "CUSTOMER_ID"2

102
CUSTOMER_XFER_BATCHES
  • SQLgt SELECT blocks, num_rows
  • 2 FROM user_tables
  • 3 WHERE table_name
  • 4 'CUSTOMER_XFER_BATCHES'
  • BLOCKS NUM_ROWS
  • ------ --------
  • 21825 1526003

103
What We Have Learned So Far
  • Oracle is doing a full scan of the index on the
    local table and fetching each row one at a time
  • This does avoid a sort
  • Very high price to pay to skip sorting a few rows
  • Oracle is doing one remote query for each row
    fetched from the local table

104
Problem Resolution - Part 1
  • SELECT customer_id, batch_serial_number,
    batch_date,
  • load_date, batch_comment, control_total
  • FROM customer_xfer_batches
  • WHERE (customer_id, batch_serial_number) IN
  • (SELECT customer_id, batch_serial_number
  • FROM customer_xfer_batches
  • INTERSECT
  • SELECT customer_id, batch_serial_number
  • FROM customer_xfer_batches_at_prdwest)
  • ORDER BY customer_id, batch_serial_number

105
Query Output from vsession_event
  • SQLgt SELECT event, total_waits, time_waited,
    max_wait
  • 2 FROM vsession_event
  • 3 WHERE sid 49
  • 4 ORDER BY event
  • EVENT TOTAL_WAITS
    TIME_WAITED MAX_WAIT
  • --------------------------- -----------
    ----------- ----------
  • SQLNet message from client 46
    3680 2481
  • SQLNet message from dblink 24
    31 18
  • SQLNet message to client 47
    0 0
  • SQLNet message to dblink 24
    0 0
  • SQLNet more data from dbli 5978
    1337 13
  • db file scattered read 3430
    675 8
  • db file sequential read 182
    60 2
  • direct path read 148
    233 11
  • direct path write 920
    3572 33

106
Query Output from vsesstat
  • SQLgt SELECT A.name, B.value
  • 2 FROM vstatname A, vsesstat B
  • 3 WHERE A.statistic 12
  • 4 AND B.statistic A.statistic
  • 5 AND B.sid 49
  • NAME VALUE
  • ------------------------------ ----------
  • CPU used by this session 3227

107
What We See in the v Data Now
  • 24 network roundtrips through a database link
    instead of 1.5 million 14 seconds (down from
    1053)
  • Fewer, larger network packets
  • Fewer requests to remote database
  • 3,600 waits on mostly multi-block disk reads
    instead of 27,000 waits on single-block disk
    reads 7 seconds (down from 80)
  • Fewer multi-block reads instead of many
    single-block reads

108
What We See in the v Data Now
  • 1100 waits on direct path I/O 38 seconds (new)
  • Sorting to implement the INTERSECT operation
  • 32 seconds of CPU time (down from 679)
  • Fewer logical reads and network roundtrips
  • Elapsed time 92 seconds (down from
  • 31 minutes)

109
Iterative Tuning
  • Curing one bottleneck often reveals or creates
    another, smaller bottleneck.
  • Repeat the wait event evaluation process after
    each change until performance goals are met.
  • In this situation, a 95 reduction in runtime
    from 31 minutes to 92 seconds still did not meet
    the performance goal.

110
What We Have So Far
  • Rewritten query completes in 92 seconds
  • 32 CPU seconds
  • 38 seconds of wait on direct path I/O
  • 14 seconds of wait on network roundtrips
  • 7 seconds of wait on multi-block and single-block
    reads

111
Problem Resolution - Part 2
  • Eliminating or speeding up direct path I/O seems
    like the logical next step
  • sort_area_size set to 1 Mb
  • Try dynamically changing it to 100 Mb?

112
Query Output from vsession_event
  • SQLgt SELECT event, total_waits, time_waited,
    max_wait
  • 2 FROM vsession_event
  • 3 WHERE sid 46
  • 4 ORDER BY event
  • EVENT TOTAL_WAITS
    TIME_WAITED MAX_WAIT
  • --------------------------- -----------
    ----------- ----------
  • SQLNet message from client 47
    442 287
  • SQLNet message from dblink 25
    25 14
  • SQLNet message to client 48
    0 0
  • SQLNet message to dblink 25
    0 0
  • SQLNet more data from dbli 6050
    1378 26
  • db file scattered read 3430
    945 8
  • db file sequential read 191
    59 1
  • log file sync 1
    3 3

113
Query Output from vsesstat
  • SQLgt SELECT A.name, B.value
  • 2 FROM vstatname A, vsesstat B
  • 3 WHERE A.statistic 12
  • 4 AND B.statistic A.statistic
  • 5 AND B.sid 46
  • NAME VALUE
  • ------------------------------ ----------
  • CPU used by this session 3296

114
What We See in the v Data Now
  • Waits on network roundtrips through a database
    link, multi-block reads, and single-block reads
    unchanged
  • CPU time used unchanged
  • Direct path I/O waits eliminated completely
  • Entire sort now performed in memory
  • Elapsed time 55 seconds (down from 92)

115
What We Learned from Wait Event Information
  • A query ran slowly due to excessive network
    roundtrips and single-block reads.
  • After these problems were corrected, 40 of the
    query execution time was devoted to sorting to
    disk.
  • Wait events showed us how Oracle was spending its
    time while executing the query, helping us
    improve the querys performance in an iterative
    fashion.

116
A Summary Of Wait Event Techniques
  • Using Statspack snapshots and reports to analyze
    wait events at the instance level
  • Polling vsession_wait to determine which buffers
    or latches have contention
  • Enabling wait event tracing in a session
  • Using Oracle 9i TKPROF to tabulate waits at the
    statement level within one session

117
A Summary Of Wait Event Techniques (continued)
  • Collecting wait event data for a session or the
    entire instance at two different times and
    computing the difference to find the wait events
    during a specific period of time
  • Ranking cumulative wait event data in order to
    see which wait events account for the most wait
    time
  • Isolating a statement and analyzing its wait
    events

118
Send Us Your Wait Event Puzzles
  • We are always looking for interesting wait event
    situations to learn from!
  • If you are trying to diagnose a problem using the
    wait event interface, feel free to email us wait
    events data and a problem description.
  • Well do our best to look over what you send us
    and share our thoughts with you.

119
The White Paper
  • A companion white paper to this presentation is
    available for free download from our companys
    website at
  • www.dbspecialists.com/presentations.html

120
Resources from Database Specialists
  • The Specialist newsletter
  • www.dbspecialists.com/specialist.html
  • Database Rx
  • dbrx.dbspecialists.com/guest
  • Provides secure, automated monitoring, alert
    notification, and analysis of your Oracle
    databases

121
In Conclusion
  • The wait event interface gives you access to a
    detailed accounting of how Oracle processes spend
    their time.
  • Wait events touch all aspects of the Oracle
    database server.
  • The wait event interface will not always give you
    the answer to every performance problem, but it
    will just about always give you insights that
    guide you down the proper path to problem
    resolution.

122
Contact Information
  • Roger Schrag
  • rschrag_at_dbspecialists.com
  • Terry Sutton
  • tsutton_at_dbspecialists.com
  • Database Specialists, Inc.
  • 388 Market Street, Suite 400
  • San Francisco, CA 94111
  • Tel 415/344-0500
  • Web www.dbspecialists.com
Write a Comment
User Comments (0)
About PowerShow.com