Thomas E' Canty ServerCare, Inc' Session - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Thomas E' Canty ServerCare, Inc' Session

Description:

... Technology, Pharmaceuticals, and Telecom, as well as Major Universities ... Disaster recovery & HA. Logical Standby. Physical Standby. Feature. Outline. Overview ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 45
Provided by: aka110
Category:

less

Transcript and Presenter's Notes

Title: Thomas E' Canty ServerCare, Inc' Session


1
Thomas E. CantyServerCare, Inc.Session 126
Data Guard Best Practices Tuning
2
Speaker Qualifications
  • Thomas E. Canty, Senior Oracle DBA, ServerCare,
    Inc.
  • 19 years of Oracle experience, starting with
    version 5
  • Has presented at IOUG, OpenWorld, NoCOUG, IASA,
  • Has been a DBA, Developer, Architect, and IT
    Manager
  • Has worked with Fortune 100 companies in
    Healthcare, Technology, Pharmaceuticals, and
    Telecom, as well as Major Universities

888-918-6309 http//www.ServerCare.com
3
Outline
  • Overview
  • Network Optimization
  • ARCn LGWR Redo Transport
  • Checkpoint, Redo Read/Apply Recovery
  • Wait Events
  • 10g R2 11g Improvements
  • Best Practices

4
Data Guard Modes
  • Maximum Performance Mode
  • Least performance impact
  • Default mode
  • Maximum Protection Mode
  • Emphasis on data safety
  • Requires at least one secondary
  • Maximum Availability Mode
  • Emphasis on uptime
  • Continues if secondary unavailable

5
Physical vs. Logical Standby
6
Outline
  • Overview
  • Network Optimization
  • ARCn LGWR Redo Transport
  • Checkpoint, Redo Read/Apply Recovery
  • Wait Events
  • 10g R2 11g Improvements
  • Best Practices

7
Session Data Unit (SDU)
  • In Oracle Net connect descriptor
  • sales.servercare.com
  • (DESCRIPTION
  • (SDU32767)
  • (ADDRESS(PROTOCOLtcp)
  • (HOSTsales-server)(PORT1521))
  • (CONNECT_DATA
  • (SIDsales.servercare.com)))
  • Globally in sqlnet.ora
  • DEFAULT_SDU_SIZE32767

8
Session Data Unit (SDU) (Cont.)
  • On standby DB, set in listener.ora
  • SID_LIST_listener_name
  • (SID_LIST
  • (SID_DESC
  • (SDU32767)
  • (GLOBAL_DBNAMEsales.servercare.com)
  • (SID_NAMEsales)
  • (ORACLE_HOME/u01/app/oracle/product/10.2.0/db_1))
    )

9
TCP Socket Buffer Size
  • Set TCP socket buffer size 3 BDP
  • Data Guard broker config. Set in sqlnet.ora
  • Non Data Guard broker set in connect descriptor
  • BDP - Bandwidth Delay Product
  • RTT- Round Trip Time

10
TCP Socket Buffer Size
  • Assume gigabit network with RTT 25 ms
  • BDP 1,000 Mbps 25msec (.025 sec)
  • 1,000,000,000 .025
  • 25,000,000 Megabits / 8 3,125,000 bytes
  • In this example
  • socket buffer size 3 bandwidth delay
  • 3,125,000 3
  • 9,375,000 bytes
  • sqlnet.ora
  • RECV_BUF_SIZE9375000
  • SEND_BUF_SIZE9375000

11
Network Queue Sizes
  • Between kernel net. subsystems NIC driver
  • txqueuelen - transmit queue size
  • netdev_max_backlog - receive queue size
  • Assumes gigabit network with 100ms latency
  • Set queues
  • ifconfig eth0 txqueuelen 10000
  • sysctl.conf
  • net.core.netdev_max_backlog20000

12
Overall Network
  • Ensure sufficient bandwidth to standby
  • Verify TCP_NODELAY set to YES (default)
  • RHEL3 - increase /proc/sys/fs/aio-max-size on
    standby
  • From 131072(default) to 1048576
  • Set RECV_BUF_SIZE SEND_BUF_SIZE 3 Bandwidth
    Delay Product (BDP)
  • Use Session Data Unit (SDU) size of 32767
  • Increase send receive queue sizes
  • TXQUEUELENGTH
  • NET_DEV_MAX_BACKLOG

13
Outline
  • Overview
  • Network Optimization
  • ARCn LGWR Redo Transport
  • Checkpoint, Redo Read/Apply Recovery
  • Wait Events
  • 10g R2 11g Improvements
  • Best Practices

14
ARCn Redo Transport
1) Read from local arch 2) Receive redo 3) Ack -
know -ledge
15
ASYNC LGWR Redo Transport
1) Write local redo 2) ASYNC send redo 3) Receive
redo 4) Ack - know -ledge 5) Write stdby redo
16
SYNC LGWR Redo Transport
1) Write local redo 2) SYNC send redo 3) Receive
redo 4) Ack - know -ledge 5) Post receipt to LGWR
17
Optimize ARCn Transport
  • Increase MAX_CONNECTIONS to 5 on standby (if
    possible)
  • default (2), maximum (5)
  • Increase LOG_ARCHIVE_MAX_PROCESSES
  • Larger than MAX_CONNECTIONS
  • Up to network bandwidth
  • default (2), maximum (30)

18
Optimize LGWR Transport
  • Decrease NET_TIMEOUT (default 180 secs.)
  • Be careful! - Not too low
  • New COMMITS
  • COMMIT IMMEDIATE WAIT (default)
  • COMMIT NOWAIT
  • COMMIT NOWAIT BATCH

19
All Redo Transport
  • Standby redo logs
  • Use fastest disks
  • No RAID5
  • Dont multiplex
  • Use the recommended number of SRLs
  • (maximum of online logfiles 1) maximum of
    threads

20
Outline
  • Overview
  • Network Optimization
  • ARCn LGWR Redo Transport
  • Checkpoint, Redo Read/Apply Recovery
  • Wait Events
  • 10g R2 11g Improvements
  • Best Practices

21
Checkpoint Phase
  • Checkpoint occurs
  • During log switch
  • LOG_CHECK_TIMEOUT expiration
  • LOG_CHECKOUT_INTERVAL reached
  • Reduce log switch interval
  • Resize redo log to 1GB - primary and secondary
  • Recommended - checkpoint every 15 minutes

22
Checkpoint Phase (Cont.)
  • Determine checkpoint frequency
  • COL NAME FOR A35
  • SELECT NAME, VALUE, TO_CHAR(SYSDATE, HHMISS)
    TIME
  • FROM VSYSSTAT WHERE NAME 'DBWR checkpoints'
  • NAME VALUE
    TIME
  • ----------------------------------- ----------
    --------
  • DBWR checkpoints 264
    081543
  • SQLgt /
  • NAME VALUE
    TIME
  • ----------------------------------- ----------
    --------
  • DBWR checkpoints 267
    083406

23
Redo Read (Secondary)
  • Obtain read rate for the standby redo log
  • SQLgt ALTER SYSTEM DUMP LOGFILE
    '/u01/oradata/docprd/sredo01.log validate
  • System altered.
  • vi docprd_ora_3560.trc
  • Mon Mar 12 085952 2007
  • ----- Redo read statistics for thread 1 -----
  • Read rate (ASYNC) 4527Kb in 0.58s gt 6.90
    Mb/sec
  • Longest record 19Kb, moves 0/7586 (0)
  • Change moves 4340/18026 (24), moved 2Mb
  • Longest LWN 92Kb, moves 1/1365 (0), moved
    0Mb
  • Last redo scn 0x0000.01272351 (19342161)

24
Redo Apply (Secondary)
  • Goal
  • Redo apply rate (secondary) gt Redo create rate
    (primary)
  • Carefully consider enabling DB_BLOCK_CHECKING
  • LOW, MEDIUM and FULL options
  • Possible performance impact

25
Redo Apply (Cont.)
  • Determine Log Block Size (LEBSZ)
  • SELECT LEBSZ FROM XKCCLE WHERE ROWNUM1
  • Get recovery blocks - at least two snapshots
  • Managed Recovery Case
  • SELECT PROCESS, SEQUENCE, THREAD, block,
    BLOCKS,
  • TO_CHAR(SYSDATE, 'DD-MON-YYYY HHMISS') time
  • from vMANAGED_STANDBY WHERE PROCESS'MRP0'
  • Determine the recovery rate (MB/sec) for a
    specific archive sequence number
  • Managed Recovery Case
  • ((BLOCK_END - BLOCK_BEG) LOG_BLOCK_SIZE) /
  • (TIME_END - TIME_BEG) 1024 1024

26
Redo Apply (Cont.)
  • Oracle Recommends

27
Recovery
  • Parallel Recovery (before 10.1.0.5)
  • Set to number of CPUs
  • recover managed standby database parallel ltgt
  • PARALLEL_EXECUTION_MESSAGE_SIZE
  • Can increase to 4096 or 8192
  • Uses additional shared pool memory
  • Problems if set too high
  • DB_CACHE_SIZE
  • Can set secondary DB_CACHE_SIZE gt primary
  • Must set to primary before changing roles

28
Outline
  • Overview
  • Network Optimization
  • ARCn LGWR Redo Transport
  • Checkpoint, Redo Read/Apply Recovery
  • Wait Events
  • 10g R2 11g Improvements
  • Best Practices

29
Arch Wait Events - Primary
  • ARCH wait on ATTACH
  • Time for all arch processes to spawn RFS
    connection
  • ARCH wait on SENDREQ
  • Time for all arch processes to write received
    redo to disk open close remote archived redo
    logs
  • ARCH wait on DETACH
  • Time for all arch processes to delete RFS
    connection

30
LGWR SYNC Wait Events - Primary
  • LGWR wait on ATTACH
  • Time for all log writer processes to spawn RFS
    connection
  • LGWR wait on SENDREQ
  • Time for all log writer processes to write
    received redo to disk open close the remote
    archived redo logs
  • LGWR wait on DETACH
  • Time for all log writer processes to delete RFS
    conn.

31
LGWR ASYNC Wait Events - Primary
  • LNS wait on ATTACH
  • Time for all network servers to spawn RFS
    connection
  • LNS wait on SENDREQ
  • Time for all network servers to write received
    redo to disk open close the remote archived
    redo logs
  • LNS wait on DETACH
  • Time for all network servers to delete RFS conn.
  • LGWR wait on full LNS buffer
  • Time for log writer (LGWR) process awaiting for
    network server (LNS) to free ASYNC buffer space

32
Wait Events on Secondary
  • RFS Write
  • Time to write to standby redo log or archive log
    non I/O work like redo block checksum
    validation
  • RFS Random I/O
  • Time to write to a standby redo log to occur
  • RFS Sequential I/O
  • Time to write to an archive log to occur

33
Outline
  • Overview
  • Network Optimization
  • ARCn LGWR Redo Transport
  • Checkpoint, Redo Read/Apply Recovery
  • Wait Events
  • 10g R2 11g Improvements
  • Best Practices

34
10g R2 Improvements
  • Multiple archive processes can transmit a redo
    log in parallel to the standby database
  • MAX_CONNECTIONS attribute of the
    LOG_ARCHIVE_DEST_n controls the number of these
    processes
  • Parallel Recovery for Redo apply is automatically
    set equal to number of CPUs
  • 10.1.0.5 and 10.2.0.1
  • Fast-Start Failover
  • Automatically fails over to a previously chosen
    physical standby database

35
10g R2 Improvements (Cont.)
  • LGWR ASYNC
  • Uses a new process (LNSn) to transmit the redo
    data directly from the online redo log to the
    standby database
  • Physical standby database flashback
  • Can flash back temporarily for reporting
  • Logical standby database
  • Automatically deletes applied archived log
  • RMAN
  • Automatically creates temp datafiles after
    recovery

36
11g Improvements
  • Physical standby database open read/write for
    test or other purposes with zero compromise in
    data protection using new Snapshot Standby
  • Automatic failover configurable for immediate
    response to designated events or errors
  • More flexibility in primary/standby
    configurations
  • e.g. Windows primary and Linux standby
  • Rolling upgrade options now in physical standby
    with Transient Logical Standby
  • ASYNC transport enhanced to eliminate the impact
    of latency on network throughput

37
11g Improvements (Cont.)
  • Fast detection of corruptions caused by lost
    writes in the storage layer
  • SQL Apply supports XML data type (CLOB)
  • Many performance, manageability, and security
    enhancements
  • Support for new Oracle Database 11g Options
    Oracle Active Data Guard and Oracle Advanced
    Compression
  • Fast Start Failover now available for Maximum
    Performance mode

38
Outline
  • Overview
  • Network Optimization
  • ARCn LGWR Redo Transport
  • Checkpoint, Redo Read/Apply Recovery
  • Wait Events
  • 10g R2 11g Improvements
  • Best Practices

39
Best Practices
  • Geographically separate primary standby DB
  • Ensure standby hardware configuration same as the
    primary
  • Tune standby for write intensive operations
  • Test Data Guard before deploy in production
  • Set standard OS and DB parameters to recommended
    values
  • Perform switchover testing
  • Fully document a failover procedure
  • Use FORCE LOGGING mode

40
Best Practices (Cont.)
  • Use real-time apply
  • Use the Data Guard Broker
  • Enable Flashback Database on both primary and
    secondary databases
  • Evaluate using AFFIRM attribute
  • Possible performance issues on primary
  • Verify Asynchronous I/O enabled
  • Carefully consider DB_BLOCK_CHECKING

41
Best Practices (Cont.)
  • Dont multiplex standby redo logs (SRLs)
  • Correctly set number of SRLs
  • Increase PARALLEL_EXECUTION_MESSAGE_SIZE
  • Place SRLs in fast disk group or disks
  • Use at lease two standby DBs with Maximum
    Protection Mode
  • Utilize COMMIT NOWAIT if appropriate

42
Best Practices (Cont.)
  • Ensure appropriate bandwidth between primary and
    secondary
  • Increase default send receive queue sizes
  • TXQUEUELENGTH
  • NET_DEV_MAX_BACKLOG
  • Session Data Unit
  • Adjust value to 32767
  • Improvement during large data transmissions

43
Questions?
  • Lots of things we didnt cover
  • If we dont cover something you wanted to hear,
    please contact me.

44
Session 126 Data Guard Best Practices Tuning
THANK YOU!
Please fill out evaluations!
Email Tom Canty tom.canty_at_servercare.com
Or Call 888-918-6309 http//www.ServerCare.com
Write a Comment
User Comments (0)
About PowerShow.com