ASE 131: Troubleshooting ASE - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

ASE 131: Troubleshooting ASE

Description:

Contains most CRs reported by customers starting in May of 2002. ... The event based dump is called a Configurable Shared Memory Dump or CSMD ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 58
Provided by: fellenm
Category:

less

Transcript and Presenter's Notes

Title: ASE 131: Troubleshooting ASE


1
ASE 131 Troubleshooting ASE
David WeinPrincipal Product Support
Engineer david.wein_at_sybase.com July 17th, 2003
2
Agenda
  • Introduction
  • Presentation Goals
  • Problem Solving Resources
  • Understanding ASE Diagnostics
  • Finding information in error messages
  • Decoding stack traces
  • Picking good key words for research
  • Trace Flags
  • The shared memory dump facility
  • Types of Problems
  • Performance Issues
  • Locking / Blocking / Deadlocking
  • Low-Level Task Related Problems
  • Backup and Recovery Issues

3
Presentation Goals
  • Help you help yourself
  • What resources are available to customers?
  • Understand how Sybase Tech Support works on
    problems
  • What is the process? What can you do to help?
  • Understand ASEs diagnostics
  • Errors, stack traces, and memory dumps
  • Understand how to deal with specific problems
    that may be encountered with ASE
  • Minimize impact by understanding what is
    happening within ASE and how to handle it.

4
Online Problem Solving Resources
  • Online support home is http//www.sybase.com/suppo
    rt
  • Product Manuals
  • Technical Documents
  • Newsgroups
  • Solved Cases / Existing Change Requests
  • EBF Downloads
  • Case Management
  • ASE documents are very strong
  • Troubleshooting Guide
  • Early chapters are a wealth of information on
    getting out of trouble and staying out of
    trouble.
  • Also contains detailed write-ups on many ASE
    messages and should always be checked as a first
    step.
  • System Administration (SAG) and Performance and
    Tuning (PT) guides are also very useful for
    understanding the area of ASE where you may be
    having problems

5
Online Problem Solving Resources, Continued
  • Public newsgroups are an invaluable service
  • Over a dozen ASE newsgroups covering general
    issue to specific topcis such as High
    Availability and Linux
  • Searchable archive
  • Informally monitored by many Sybase employees and
    lots of experienced customers
  • Solved cases contains over 17,000 searchable
    support cases with answers
  • Common problems and their answers are well
    documented in this knowledge base.
  • Hundreds of additional entries are added every
    month
  • Change Requests (CRs or bugs) are also searchable
  • Contains most CRs reported by customers starting
    in May of 2002.
  • Contains a summary and applicable version

6
PT Newsgroup Screenshot
7
Agenda
  • Introduction
  • Problem Solving Resources
  • Working with Sybase Tech Support
  • Change Request / Bug Fix Process
  • Understanding ASE Diagnostics
  • Finding information in error messages
  • Decoding stack traces
  • Picking good key words for research
  • Trace flags
  • The shared memory dump facility
  • Types of Problems
  • Performance Issues
  • Locking / Blocking / Deadlocking
  • Low-Level Task Related Problems
  • Backup and Recovery Issues

8
ASE Error Message Formats
  • Messages that appear in the log have a header
    string that provides good information
  • 0200000000562002/12/16 234049.14 kernel
    current process (0x11890001) infected with 11
  • Use this information to isolate overlapping
    messages on multi-engine systems

Date / Time
Family ID
ASE Layer
Process ID
Engine ID
9
ASE Error Message Formats, Continued
  • ASE uses both numbered and unnumbered messages
  • The numbered messages correspond to what is
    found in the sysmessages table.
  • The unnumbered messages are more internal in
    nature, do not exist in sysmessages, and are not
    sent to the client.
  • Examples
  • Numbered message
  • Msg 208, Level 16, State 1
  • Line 1 foo not found. Specify owner.objectname
    or use sp_help to check whether the object exists
    (sp_help may produce lots of output).
  • Unnumbered message
  • 2002/09/05 164449.82 kernel  sddone write
    error on virtual disk 16 block 3990498

10
Severity and State
  • Numbered messages consist of four components
  • The message number itself, i.e. 208, 695, 1105,
    etc.
  • A severity level between 10 and 24.
  • Severity determines how ASE responds to the error
  • Levels 10 18 represent a user error. These are
    non-fatal. 10 16 can be corrected by the user.
    17 and 18 may require DBA intervention.
  • Levels 19 26 are fatal and will result in a
    terminated client session. These are more severe
    problems and represent internal inconsistencies
  • Severity determines how the error is reported
  • Non-SA client only receive non-fatal errors (lt
    18). They receive an alternate message for
    severity 19 and higher.
  • SA clients receive errors regardless of severity
  • Only severity 19 and higher are sent to the
    errorlog / console.
  • If traceflag 3602 is enabled, severity 10-18 are
    also sent to the errorlog.
  • A state value
  • States identify the instance of an error. For
    errors raised several places, this isolates the
    code location that raised the instance.
  • This is very important for tech support!
  • The actual text of the message

11
Stack Traces
An invaluable diagnostic tool
  • Stack traces provide useful information relating
    to what a task was doing when a fatal condition
    was encountered
  • Stack traces go to the errorlog and console when
    a fatal error is encountered
  • Fatal conditions such as time slice violations
    and signals also cause stack traces to be
    generated
  • Trace flag 3601 was cause a stack trace to be
    generated for non-fatal errors as well, but this
    is not normally advisable.
  • Stack traces are often preceded or followed by
    relevant information
  • Be sure to save the entire errorlog if you see a
    stack trace
  • If you are working with Tech Support, they will
    want to see the entire log, not just a cut and
    paste of the stack trace.
  • Picking good key words out of a stack may help
    you find the cause online
  • Using good key words for searches may turn up the
    answer in the known problems or solved cases
    databases.

12
Stack Traces, Continued
An example stack trace
  • 0500000000742002/06/21 085821.17 kernel 
    current process (0x128b007b) infected with
    11
  • 0500000000742002/06/21 085821.23 kernel 
    Address 0x0068b7f8 (getnext_ctlib_subst0x138),
    siginfo (code, address) (1, 0x0000010a)
  • 0500000000742002/06/21 085821.23 kernel 
  • 0500000000742002/06/21 085821.23 kernel  SQL
    causing error execute etcm.etcm.ap_get_ba_addres
    s1 _at_country 'CAN', _at_addr_type 'I', _at_basicba
    '999999', _at_addrsub '99', _at_baname ''
  • 0500000000742002/06/21 085821.23 kernel 
  • 0500000000742002/06/21 085821.23 server  SQL
    Text execute etcm.etcm.ap_get_ba_address1
    _at_country 'CAN', _at_addr_type 'I', _at_basicba
    '999999', _at_addrsub '99', _at_baname ''
  • 0500000000742002/06/21 085821.23 kernel 
    curdb 13 pstat 0x10100 lasterror 0
  • 0500000000742002/06/21 085821.23 kernel 
    preverror 0 transtate 0
  • 0500000000742002/06/21 085821.23 kernel 
    curcmd 197 program

13
Stack Traces, Continued
  • 0500000000742002/06/21 085821.24 kernel  pc
    0x0064f8dc pcstkwalk0x24(0x29b869a0, 0x00000000,
    0x0000270f, 0x00000002, 0xfffffff8)
  • 0500000000742002/06/21 085821.24 kernel  pc
    0x0064f7e8 ucstkgentrace0x194(0x128b007b,
    0x2d3ccd58, 0x2d3ccd58, 0x2bb795c8, 0x00000000)
  • 0500000000742002/06/21 085821.24 kernel  pc
    0x0061cc90 ucbacktrace0xa8(0x2bb795c8,
    0x00000001, 0x128b007b, 0x2d14f238, 0x00000000)
  • 0500000000742002/06/21 085821.25 kernel  pc
    0x000f62e0 terminate_process0xd70(0x00beef50,
    0x000054e8, 0x000054e4, 0xffffffff, 0x00005000)
  • 0500000000742002/06/21 085821.25 kernel  pc
    0x006324d4 kisignal0x1e0(0x29b8745c, 0x29b871c4,
    0x0000000b, 0x29b87198, 0x29b87450)
  • 0500000000742002/06/21 085821.26 kernel  pc
    0xff34b7dc _getfp0x220(0x0000000b, 0x29b87450,
    0x29b87198, 0x006322f4, 0x00bf3424)
  • 0500000000742002/06/21 085821.26 kernel  pc
    0xff3484c0 _fork0x864(0x0000000b, 0x00bf3380,
    0x00000000, 0x00000000, 0x00000000)

14
Stack Traces, Continued
  • 0500000000742002/06/21 085821.26 kernel  pc
    0x0068b77c getnext_ctlib_subst0xbc(0x00000200,
    0x00000000, 0x00000000, 0x00000000, 0x38433e50)
  • 0500000000742002/06/21 085821.26 kernel 
    Handler pc 0x006886ec omni_backout installed by
    the following function-
  • 0500000000742002/06/21 085821.26 kernel  pc
    0x0068b490 omni_getnext0x2c0(0x38433e50,
    0x2c65f574, 0x2d3ccd58, 0x2c662c5c, 0x2d3d2298)
  • 0500000000742002/06/21 085821.27 kernel  pc
    0x0042a760 exec_eop0x1750(0x00002000,
    0x2d3ccd58, 0x00000000, 0x00000000, 0x000005c5)
  • 0500000000742002/06/21 085821.27 kernel  pc
    0x004297ac exec_eop0x79c(0x00007000, 0x2d3ccd58,
    0x45478880, 0x00000000, 0x00000000)
  • 0500000000742002/06/21 085821.28 kernel 
    Handler pc 0x0042e510 execerr installed by the
    following function-
  • 0500000000742002/06/21 085821.28 kernel 
    Handler pc 0x0056f508 jcsExHandler installed by
    the following function-
  • 0500000000742002/06/21 085821.30 kernel 
    Handler pc 0x003dfe84 sortmerr installed by the
    following function-

15
Stack Traces, Continued
  • 0500000000742002/06/21 085821.30 kernel 
    Handler pc 0x0042e510 execerr installed by the
    following function-
  • 0500000000742002/06/21 085821.30 kernel 
    Handler pc 0x0042e510 execerr installed by the
    following function-
  • 0500000000742002/06/21 085821.30 kernel 
    Handler pc 0x00365b38 aritherr installed by the
    following function-
  • 0500000000742002/06/21 085821.30 kernel  pc
    0x00428960 execute0xb80(0x455e96d0, 0x0000702c,
    0x455e96d0, 0x2d3ccd58, 0x00005000)
  • 0500000000742002/06/21 085821.30 kernel  pc
    0x0038aebc s_execute0x2404(0x80000000,
    0x000000c5, 0x455e9348, 0x00005393, 0x00001000)
  • 0500000000742002/06/21 085821.30 kernel 
    Handler pc 0x003a4e50 s_handle installed by the
    following function-
  • 0500000000742002/06/21 085821.30 kernel  pc
    0x003a1e8c sequencer0x10d8(0x00be0400,
    0x00beec00, 0x29025200, 0x00005388, 0x2d3ccd58)
  • 0500000000742002/06/21 085821.30 kernel  pc
    0x003921f8 execproc0x79c(0x00005388, 0x00007000,
    0x2d3ccd58, 0x00005000, 0x0000001f)

16
Stack Traces, Continued
  • 0500000000742002/06/21 085821.30 kernel  pc
    0x0038b3d4 s_execute0x291c(0x00beec00,
    0x00be2800, 0x29025200, 0x00000008, 0x00000000)
  • 0500000000742002/06/21 085821.30 kernel 
    Handler pc 0x003a4e50 s_handle installed by the
    following function-
  • 0500000000742002/06/21 085821.30 kernel  pc
    0x003a1e8c sequencer0x10d8(0x00be0400,
    0x00beec00, 0x29025200, 0x00005388, 0x2d3ccd58)
  • 0500000000742002/06/21 085821.30 kernel  pc
    0x003921f8 execproc0x79c(0x00005388, 0x00007000,
    0x2d3ccd58, 0x00005000, 0x0000001b)
  • 0500000000742002/06/21 085821.30 kernel  pc
    0x0038b3d4 s_execute0x291c(0x00beec00,
    0x00be2800, 0x29025200, 0x00000008, 0x00000000)
  • 0500000000742002/06/21 085821.30 kernel 
    Handler pc 0x003a4e50 s_handle installed by the
    following function-
  • 0500000000742002/06/21 085821.30 kernel  pc
    0x003a1e8c sequencer0x10d8(0x00be0400,
    0x00beec00, 0x29025200, 0x00005388, 0x2d3ccd58)

17
Stack Traces, Continued
  • 0500000000742002/06/21 085821.30 kernel  pc
    0x003921f8 execproc0x79c(0x00005388, 0x00007000,
    0x2d3ccd58, 0x00005000, 0x0000001b)
  • 0500000000742002/06/21 085821.30 kernel  pc
    0x0038b3d4 s_execute0x291c(0x00beec00,
    0x00be2800, 0x29025200, 0x00000008, 0x00000000)
  • 0500000000742002/06/21 085821.30 kernel 
    Handler pc 0x003a4e50 s_handle installed by the
    following function-
  • 0500000000742002/06/21 085821.30 kernel  pc
    0x003a1e8c sequencer0x10d8(0x00be0400,
    0x00beec00, 0x29025200, 0x00005388, 0x2d3ccd58)
  • 0500000000742002/06/21 085821.30 kernel  pc
    0x0011cf9c tdsrecv_language0xb4(0x00be0400,
    0x00befc00, 0x00000001, 0x00005000, 0x00005400)
  • 0500000000742002/06/21 085821.30 kernel 
    Handler pc 0x0013d9ec hdl_backout installed by
    the following function-
  • 0500000000742002/06/21 085821.31 kernel 
    Handler pc 0x0033fe1c ut_handle installed by
    the following function-
  • 0500000000742002/06/21 085821.31 kernel 
    Handler pc 0x0033fe1c ut_handle installed by
    the following function-

18
Stack Traces, Continued
  • 0500000000742002/06/21 085821.31 kernel  pc
    0x0012c6bc conn_hdlr0x20f0(0x00befc00,
    0x00be0400, 0x00000000, 0x000054e4, 0x00005400)
  • 0500000000742002/06/21 085821.31 kernel  pc
    0x00662618 _coldstart(0x00000081, 0x0012a5cc,
    0x00000000, 0x00000000, 0x00000000)
  • 0500000000742002/06/21 085821.31 kernel  end
    of stack trace, spid 74, kpid 311099515, suid 141

19
Good Information from Stack Traces
  • While most of the stack is only relevant for
    internal Sybase folks, some of it can be used to
    help match problems with exiting cases or bugs.
  • First, make note of the problem. Here it is an
    infected with 11
  • 0500000000742002/06/21 085821.17 kernel 
    current process (0x128b007b) infected with
    11
  • Next, note which function was executing when the
    signal arrived
  • 0500000000742002/06/21 085821.23 kernel 
    Address 0x0068b7f8 (getnext_ctlib_subst0x138),
    siginfo (code, address) (1, 0x0000010a)
  • Here the function getnext_ctlib_subst() is where
    the problem was encountered.
  • This line is only relevant for signal stack
    traces and doesnt apply to time slices or most
    other errors.

20
Good Information from Stack Traces, Continued
  • The SQL causing error can help you reproduce or
    avoid the problem.
  • 0500000000742002/06/21 085821.23 kernel  SQL
    causing error execute etcm.etcm.ap_get_ba_addres
    s1 _at_country 'CAN', _at_addr_type 'I', _at_basicba
    '999999', _at_addrsub '99', _at_baname '
  • Lines that follow contain the current database
    (pcurdb) as well as any errors that were
    encountered prior to the event that triggered the
    stack trace.

21
Picking Good Key Words for Research
  • Use good searching strategies, such as
    remembering synonyms.
  • When researching stack traces, search on
    functions near the top of the stack.
  • For infected with issues, try searching on the
    function in the address line following the
    infected with 11.
  • In the stack example we just looked at, we would
    search on getnext_ctlib_subst.
  • Search example on following slides

22
Picking Good Keywords for Research, Continued
  • Some ASE functions appear in almost every stack
    trace, and are therefore poor keywords to use
    when doing research
  • Avoid searching based on these functions
  • pcstkwalk
  • ucstkgentrace
  • ucbacktrace
  • os_backtrace
  • terminate_process
  • kisignal
  • hdl_default
  • hdl_backout
  • s_handle
  • execerr
  • aritherr
  • ex_raise
  • close_network
  • execute
  • s_execute
  • exec_eop
  • sequencer
  • s_compile
  • tdsrecv_language
  • conn_hdlr
  • _coldstart
  • kaclkintr
  • _fork
  • _getfp
  • Functions towards the top of the stack are more
    likely to be relevant than those on the bottom
  • If you match the bottom of the stack but not the
    top, you likely havent matched the problem.

23
Researching our Stack Trace
Step 1 Search solved cases
Keyword we have entered
24
Researching our Stack Trace, Continued
Step 2 Get the results
25
Researching our Stack Trace, Continued
Step 3 Click on the case number and see the
details
26
Researching our Stack Trace, Continued
Step 4 Search for CR details
Changed search type
27
Researching our Stack Trace, Continued
Step 5 Get more details on the CR
28
ASE Trace Flags
  • ASE has many, many trace flags that affect the
    server behavior.
  • Some are officially documented, some are not.
  • Some basic rules apply for all trace flags
  • Dont turn on a trace flag unless you understand
    what it does and what the side effects are.
  • Document who turned the flag on, when it was
    turned on, and why it was turned on.
  • Dont ever add a trace flag to a RUN_server file
    without adding a comment detailing why it is
    there.

29
ASE Trace Flags How they work
  • With few exceptions, trace flags are not tracked
    on a per-connection basis.
  • A single, server-wide, mask of activated trace
    flags is maintained.
  • When a connection turns on a trace flag, it will
    inherit the behavior of all other activate trace
    flags within the server.
  • Example
  • Spid 10 runs dbcc traceon (3604, 302, 310)
  • Spid 11 runs dbcc traceon (8399)
  • Trace 302 and 310 output will be generated for
    any query spid 11 runs. However, the output will
    not be sent back to the client since spid 11 did
    not activate trace 3604
  • Some trace flags will automatically apply to all
    connections, regardless of whether or not they
    have activated other flags.
  • Access to the ASE code is required to determine
    which flags have this behavior, therefore when in
    doubt you should consult Sybase Tech Support
  • Most flags that affect query processing fall into
    this category
  • The only flags that must appear in the RUN_server
    file are those that affect the boot-time behavior
    and initialization of the server.

30
The Sybmon Utility
ASEs advanced diagnostic facility
  • Diagnostic utility built into the dataserver
    binary
  • Provides access to shared memory structures
  • May be used to analyze live servers or dumped
    shared memory images of servers
  • Non-intrusive and requires no prior planning
    before invoking
  • The halt and restart commands are intrusive
    and will cause the server to suspend processing.
    However, these are the exceptions.
  • Sybmon dumps provide a snapshot image into a
    server
  • This snapshot is often times very useful in
    determining the cause of a problem
  • However, Sybmon dumps alone cannot answer every
    problem
  • Problems that occur over time (example loss of
    SIGALRMs) cant be seen with a snapshot
  • Problems that are occur outside of the shared
    memory segment cant be seen.

31
The Shared Memory Dump Facility (CSMD)
  • ASE has the ability to copy its shared memory to
    a file for examination by Sybase Tech Support
  • The shared memory dump can be manually triggered,
    or it can be event based
  • The event based dump is called a Configurable
    Shared Memory Dump or CSMD
  • CSMD can be configured for
  • A specific error or a group of errors, such as
    605 or 6xx
  • All errors with a severity equal to or greater
    than that configured
  • A signal, such as signal 10 or 11 (infected with
    10 or 11)
  • A timeslice violation
  • A panic / stack overflow
  • An unnumbered message that might appear in the
    log
  • Memory dumps can be analyzed by Sybase to
    determine the state of the server and the
    contents of memory structures at the time of a
    fault

32
sp_shmdumpconfig Stored Procedure
The interface to CSMD
  • The sp_shmdumpconfig stored procedure interfaces
    with the CSMD facility
  • Add or drop dump conditions
  • View configured dump conditions
  • View and modify dump defaults
  • To view current setup, run sp_shmdumpconfig with
    no arguments
  • Example on the next slide
  • For details on syntax and usage, see the notes
    section of this slide
  • Two relevant sp_configure parameters
  • dump on conditions set to 1 to turn on, set to
    0 (default) to turn off
  • max dump conditions sets the maximum number of
    dump conditions that can be configured (default
    10)

33
sp_shmdumpconfig Sample Output
  • gt sp_shmdumpconfig
  • gt go  
  • Type       Value Maxdumps Page_Cache Proc_Cache
    Unused_Space Est_File_Size
  •          Filename          Directory  
  • ---------- ----- -------- ---------- ----------
    ------------ -------------        
    ------------------------------         ----------
  • Error      813   Default  Default    Default   
    Default      13 MB
  • Default File Name          Default Directory  
  • Signal     10    Default  Include    Default   
    Default      22 MB
  •    Default File Name          Default Directory  
     
  • Timeslice  ---   Default  Default    Default   
    Default      13 MB
  •   Default File Name          Default Directory  
  • Panic      ---   Default  Default    Default   
    Default      13 MB
  •   Default File Name          Default Directory  
  • Defaults   ---   1        Omit       Omit      
    Omit         13 MB
  •   Generated File Name          /opt/sybase/csmd
  • Current number of conditions 4
  • Maximum number of conditions 10

34
Sybmon and CSMD What You Should Know
  • Sections of memory included in dump files are
    configurable
  • Core data structures are always included
  • Page cache, procedure cache, and unused space are
    optional
  • Inclusion of procedure cache is recommended
  • Dump files can be very large
  • Size is based on the amount of memory ASE is
    configured for and the optional memory modes
  • ASE processing will stop while the dump file is
    being written
  • This is done to achieve a consistent image of
    memory in the dump file
  • Time of the halt is based on the size of the dump
    and the speed of the I/O system
  • Specify a dump directory, but dont specify a
    file name
  • It is best to let ASE generate the name for the
    dump file. This prevents dump files from being
    overwritten
  • Always FTP dump files in binary mode

35
Should You Be Proactive With CSMD?
  • Some shops setup CSMD before they encounter
    problems to help speed resolution should they
    encounter any
  • Do this at your own discretion.
  • Keep in mind that generating a CSMD causes ASE to
    halt while the file is being written
  • Some conditions to consider
  • Signals 4 (very rare), 10, 11 (signal 0xC0000005
    on NT)
  • Timeslice, panic
  • Fatal errors (configure for severity 19)
  • Keep in mind that the dump file may not be that
    useful and Tech Support may not be interested in
    it
  • Based on the specifics of the issue the CSMD may
    not be the best way to approach the problem

36
Agenda
  • Introduction
  • Problem Solving Resources
  • Working with Sybase Tech Support
  • Change Request / Bug Fix Process
  • Understanding ASE Diagnostics
  • Finding information in error messages
  • Decoding stack traces
  • Picking good key words for research
  • Trace Flags
  • The shared memory dump facility
  • Types of Problems
  • Performance Issues
  • Locking / Blocking / Deadlocking
  • Low-Level Task Related Problems
  • Backup and Recovery Issues

37
Performance Problems
  • Planning ahead greatly simplifies the resolution
    of performance problems.
  • Due to the variances in applications and
    environment, it is difficult to determine what is
    normal and what is abnormal for a particular
    customer.
  • Gathering benchmark data prior to encountering a
    problem allows for the isolation of problems

38
Basic Performance Troubleshooting
  • Isolation query issues from system issues
  • Optimizer issue, blocking issue, or resource
    issue?
  • Gather data over time
  • Establish baselines for all levels of activity
  • Understand typical system loads at various times
  • Look for trends
  • Avoid tuning based on a single 5 minute sample
  • Understand what you are changing
  • Read documentation before tuning anything
  • Document what you did and why you did it
  • Dont ever turn on a trace flag until you know
    what it does and what the side effects are!

39
Tools to Consider for General Problems
  • sp_sysmon
  • Very good at giving an overall view of what the
    server is doing.
  • Not so good at narrowing down to specific users,
    queries, etc.
  • Well documented and easy to use, but can be
    difficult to find meaningful data.
  • Monitor and historical servers
  • Provides data sp_sysmon cant
  • Better at monitoring specific users, queries,
    objects, etc.
  • Requires separate setup, but this is trivial
  • Can serve client apps written with the Monitor
    Client Library, such as Sybase central plug-in
  • Shipped as part of ASE, no separate licenses
    required
  • DBXray
  • New optional feature for ASE 12.5.0.3
  • Provides real-time, graphical monitoring of ASE
  • Other external tools
  • OS level tools such as iostat, vmstat, sar, etc.
  • In extreme cases, tools such as truss can be used
    to determine the amount of time spent in
    individual system calls, but this adds a lot of
    overhead.

40
Guidelines for Using sp_sysmon
  • Archive your output
  • If performance drops off, archived output will be
    very useful
  • Run sysmon prior to major changes
  • Hardware
  • Application
  • Configuration parameters
  • Run sysmon for short periods of time
  • Dont overflow monitor counters (unsigned int)
  • A sample that includes both busy and idle time
    may reveal neither.
  • Look at per second and per transaction numbers
  • Better than raw numbers for making comparisons
  • Be careful of conflicts
  • Monitor and historical server share counters with
    sysmon
  • Run only a single sysmon at a time
  • Be careful of automated recommendations
  • Newer versions make recommendations
  • Look at the system over time and consider
    implications prior to following any of them.

41
Locking / Blocking Tempdb Contention
As the overall scalability of ASE dramatically
improved, tempdb became a bottleneck
  • Why is this a problem?
  • Creating and dropping tables requires exclusive
    locks on system catalogs
  • High throughput environments create / drop many
    temp tables
  • This is normally fast, but having to do physical
    IO while holding these locks can queue up other
    processes
  • Becomes a point of serialization in a high
    throughput environment

42
Tempdb Contention, Continued
Types of temporary tables
  • Procedural tables
  • Created inside of a stored procedure
  • Visible only to that procedure and sub procedures
  • Automatically dropped when proc exists
  • Session tables
  • Created outside of a proc using tablename
  • Visible only to that session
  • Automatically dropped when that session exists
  • Shareable tables
  • A regular table that lives in tempdb
  • Visible to all users
  • Will not be automatically dropped (except for ASE
    reboot)

Contention
43
Tempdb Contention, Continued
Some things to try
  • On the application side, reduce the number of
    tables created
  • Replace procedural tables with session tables
  • proc can truncate the table at entry
  • Have multiple sessions share the same regular
    table
  • Row lock this table
  • Requires additional logic at the application
    level
  • One the server side, reduce latency while holding
    locks
  • Place tempdb on a ramdisk
  • Use named caches to take load out of default data
    cache

44
Tempdb Contention, Continued
Common Misconceptions
  • Performing create followed by insertselect is
    better than doing select into
  • Based on the belief that ASE holds the system
    table locks for duration of select into
  • Reality is select into releases catalog locks
    prior to inserting rows
  • Select into also faster due to reducing logging
  • Truncating a table prior to dropping it reduces
    time locks are held
  • This is absolutely true. During drop we must
    deallocate all extents while holding the system
    table locks.
  • However, ASE already truncates temp tables prior
    to grabbing system table locks (introduced in
    11.9.2)
  • Traceflag 3703 disables this, and it can increase
    contention. A common misconception is that it
    will reduce contention.

45
Tempdb Contention, Continued
Big Improvements in 12.5.0.3
  • Multiple tempdbs spread the load
  • 12.5.0.3 allows DBAs to create and assign
    multiple tempdbs, based on application, login,
    etc.
  • Spreads the load and dramatically improves
    throughput
  • Lazy log writing reduces physical IO
  • Log buffers are not flushed to disk during commit
    processing in temporary databases
  • Data buffers not flushed to disk at end of select
    into where target table is in a temporary
    database

46
Deadlocks
  • Deadlocks are almost always an application issue.
  • Get details by turning on print deadlock
    information
  • sp_configure option
  • Replaces the older 1204 traceflag for every day
    troubleshooting
  • Traceflags can provided additional details
  • 1204 prints deadlock chains (similar to print
    deadlock information above). This is required
    for other deadlock traceflags, so it should only
    be used when the trace flags below are required.
  • 1205 prints a stack trace for each task in the
    deadlock chain. Tech support may request this in
    rare circumstances (requires the use of trace
    1204)
  • 1218 prints the query plan (showplan) upon a
    deadlock. (requires the use of trace 1204)
  • My favorite way to understand deadlocks is to
    diagram them.
  • Note 12.5.0.3 MDA tables also contain deadlock
    information

47
Other Lock Troubleshooting Tips
  • Traceflag 1202 will insert blocked lock requests
    into syslocks
  • Normally blocked requests do not go to syslocks,
    and therefore are not displayed by sp_lock
  • 1202 causes them to be displayed for additional
    troubleshooting
  • Traceflags 1212 and 1217 provide extensive lock
    tracing
  • 1212 traces every lock acquisition and release
  • 1217 traces lock acquisition and release for user
    tables only (same as 1212, but doesnt display
    system table locks)
  • Both of these traces provide A LOT of output, but
    may be helpful in isolated test and development
    environments
  • May provide a better idea of how the locking
    system is handling your query

48
Low-Level Task Related Problems
Things that make your process go boom
  • A task may be terminated due to some internal
    problems
  • Time slice violations
  • Operating system signals
  • Stack and stack guard overflows
  • Normally the task will be terminated and ASE will
    move on. However
  • If the terminating task is holding a spinlock,
    ASE will shut itself down.
  • Stack overflow issues will always result in an
    ASE shutdown

49
Low-Level Task Related Problems Time Slice
Time slice violations prevent a run-away task
from taking over your server
  • ASE uses a non-preemptive scheduling algorithm
  • Each scheduled task is responsible for
    voluntarily or automatically scheduling itself
    out.
  • Every task is given a time quantum to run in,
    along with a grace time.
  • If a task fails to yield after this grace time is
    exhausted, it will be killed.
  • Tasks normally run for a very short time,
    yielding when
  • The task must sleep for a resource, such a
    network or physical I/O
  • Execution has completed and the task becomes idle
  • They have exhausted their time quantum and are
    executing in the grace period. ASE code
    frequently checks to see if the time quantum has
    been exhausted and if the task should yield.
  • Tasks may fail to yield within the grace period
    when
  • The task is caught in a loop that does not
    contain any yield point
  • A long code path is being executed that does not
    contain a necessary yield point
  • An operating system call that was expected to
    take a short period of time takes a long time

50
Time Slice - Yielding
51
Time Slice Errors
52
Time Slice Errors, Continued
What should you do if you get a time slice error?
  • Two sp_configure parameters are commonly
    referenced
  • time slice -gt typically this should not be
    changed as it will have implications for every
    task.
  • cpu grace time -gt this can be increased to give
    a task more time to execute before being killed.
    This will only impact processes that are running
    for an extremely long period of time.
  • Tweaking parameters is easy, but not always
    fruitful
  • The default grace time of 500 already gives a
    task 50 seconds to yield, which should be more
    than enough time.
  • If it doesnt yield in 50 seconds, it probably
    wont yield in 75 or 100 seconds, and you will
    have tied up the engine for that much longer
  • Finding a solution
  • Try to isolate the behavior, i.e. errors occur on
    RPCs to a specific site.
  • Pick good key words from the stack and check the
    solved cases database.
  • Work with Sybase Tech Support to find the root
    cause. In some cases, a configured shared memory
    dump may help.

53
Low-Level Task Related Problems - Signals
ASE can survive most signals
  • ASE tasks may receive fatal signals from the
    operating system.
  • At boot time ASE installs handlers for many of
    these signals so that it can respond
    appropriately.
  • ASE prefers not to dump core and exit. Rather,
    it will terminate its internal task that
    generating the signal and continue scheduling
    work on that engine.
  • A message in the error log will read Current
    process infected with and a stack trace will
    be generated.
  • Signals you may see
  • Infected with (signal) 11 / Segmentation
    Violation an attempt was made to read or write
    memory outside of ASEs valid address space.
  • Infected with (signal) 10 / Bus Error similar to
    signal 11, but may be related to a misaligned
    attempt to access memory.
  • On the Windows platform, you would see a Storage
    Access Violation with code 0xC0000005

54
Signals, Continued
Dealing with signal problems
  • Causes of fatal signals
  • Coding error within ASE
  • Operating system / hardware problem
  • Improperly coded applications should not result
    in ASE performing illegal memory access.
  • Be sure to see if this is a known problem
  • Check solved cases and open bugs, using good key
    words from the stack trace.
  • Consider a newer EBF is your release is
    significantly behind.
  • Make sure your OS patches meet those required by
    the release bulletin, as well as those
    recommended by your OS vendor.
  • Reproduce, reproduce, reproduce
  • It is very difficult to develop a code solution
    to a signal without a reproduction.
  • Use information such a the SQL Causing Error,
    application, and login to try to reproduce the
    problem.
  • Tech support can and will assist without a
    reproduction, and their efforts will largely
    focus on developing a reproduction.
  • Shared memory dumps are very common in these
    situations, and may help support in developing a
    reproduction.

55
Low-Level Task Related Problems Stacks
ASE must monitor its own stacks to look for
overflows
  • ASE maintains stack space for its tasks
  • The per-task allocation consists of space for the
    stack, followed by a guard word.
  • The guard word allows a task to overflow its
    stack without affecting adjacent memory.
    Normally, it is empty space.
  • If the guard word is not empty, or if it has been
    overflowed, ASE considers memory to be corrupt
    and panics, shutting itself down immediately.
  • What to do
  • The size of the stack and the guard word are
    independently configurable. If you must make an
    adjustment, change the stack size and not the
    guard word.
  • Again, check OS patches. This is especially
    important on Solaris.
  • A shared memory dump for the panic condition can
    be gathered. This allows support to examine the
    raw stacks.
  • If this is a recurring problem, try to isolate
    specific queries.

56
Backup and Recovery Problems
  • Validate your backups!!!
  • Most major backup and recovery issues occur when
    a customer is forced to go to backup and then
    finds out their backups cannot be loaded.
  • My advice do not consider you database backed-up
    unless you have successfully tested a load of the
    dump.
  • This can be an expensive bit of overhead on VLDB
    systems, but it is a very good investment!
  • There is very little that anybody can do to get a
    corrupted database or transaction dump to load.

57
Thank You
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com