Thomas E' Canty ServerCare, Inc' Session presentation

About This Presentation

Transcript and Presenter's Notes

Title: Thomas E' Canty ServerCare, Inc' Session

1
Thomas E. CantyServerCare, Inc.Session 126
Data Guard Best Practices Tuning
2
Speaker Qualifications

Thomas E. Canty, Senior Oracle DBA, ServerCare,
Inc.
19 years of Oracle experience, starting with
version 5
Has presented at IOUG, OpenWorld, NoCOUG, IASA,
Has been a DBA, Developer, Architect, and IT
Manager

Has worked with Fortune 100 companies in
Healthcare, Technology, Pharmaceuticals, and
Telecom, as well as Major Universities

888-918-6309 http//www.ServerCare.com
3
Outline

Overview
Network Optimization
ARCn LGWR Redo Transport
Checkpoint, Redo Read/Apply Recovery
Wait Events
10g R2 11g Improvements
Best Practices

4
Data Guard Modes

Maximum Performance Mode
Least performance impact
Default mode
Maximum Protection Mode
Emphasis on data safety
Requires at least one secondary
Maximum Availability Mode
Emphasis on uptime
Continues if secondary unavailable

5
Physical vs. Logical Standby
6
Outline

Overview
Network Optimization
ARCn LGWR Redo Transport
Checkpoint, Redo Read/Apply Recovery
Wait Events
10g R2 11g Improvements
Best Practices

7
Session Data Unit (SDU)

In Oracle Net connect descriptor
sales.servercare.com
(DESCRIPTION
(SDU32767)
(ADDRESS(PROTOCOLtcp)
(HOSTsales-server)(PORT1521))
(CONNECT_DATA
(SIDsales.servercare.com)))
Globally in sqlnet.ora
DEFAULT_SDU_SIZE32767

8
Session Data Unit (SDU) (Cont.)

On standby DB, set in listener.ora
SID_LIST_listener_name
(SID_LIST
(SID_DESC
(SDU32767)
(GLOBAL_DBNAMEsales.servercare.com)
(SID_NAMEsales)
(ORACLE_HOME/u01/app/oracle/product/10.2.0/db_1))
)

9
TCP Socket Buffer Size

Set TCP socket buffer size 3 BDP
Data Guard broker config. Set in sqlnet.ora
Non Data Guard broker set in connect descriptor
BDP - Bandwidth Delay Product
RTT- Round Trip Time

10
TCP Socket Buffer Size

Assume gigabit network with RTT 25 ms
BDP 1,000 Mbps 25msec (.025 sec)
1,000,000,000 .025
25,000,000 Megabits / 8 3,125,000 bytes
In this example
socket buffer size 3 bandwidth delay
3,125,000 3
9,375,000 bytes
sqlnet.ora
RECV_BUF_SIZE9375000
SEND_BUF_SIZE9375000

11
Network Queue Sizes

Between kernel net. subsystems NIC driver
txqueuelen - transmit queue size
netdev_max_backlog - receive queue size
Assumes gigabit network with 100ms latency
Set queues
ifconfig eth0 txqueuelen 10000
sysctl.conf
net.core.netdev_max_backlog20000

12
Overall Network

Ensure sufficient bandwidth to standby
Verify TCP_NODELAY set to YES (default)
RHEL3 - increase /proc/sys/fs/aio-max-size on
standby
From 131072(default) to 1048576
Set RECV_BUF_SIZE SEND_BUF_SIZE 3 Bandwidth
Delay Product (BDP)
Use Session Data Unit (SDU) size of 32767
Increase send receive queue sizes
TXQUEUELENGTH
NET_DEV_MAX_BACKLOG

13
Outline

Overview
Network Optimization
ARCn LGWR Redo Transport
Checkpoint, Redo Read/Apply Recovery
Wait Events
10g R2 11g Improvements
Best Practices

14
ARCn Redo Transport
1) Read from local arch 2) Receive redo 3) Ack -
know -ledge
15
ASYNC LGWR Redo Transport
1) Write local redo 2) ASYNC send redo 3) Receive
redo 4) Ack - know -ledge 5) Write stdby redo
16
SYNC LGWR Redo Transport
1) Write local redo 2) SYNC send redo 3) Receive
redo 4) Ack - know -ledge 5) Post receipt to LGWR
17
Optimize ARCn Transport

Increase MAX_CONNECTIONS to 5 on standby (if
possible)
default (2), maximum (5)
Increase LOG_ARCHIVE_MAX_PROCESSES
Larger than MAX_CONNECTIONS
Up to network bandwidth
default (2), maximum (30)

18
Optimize LGWR Transport

Decrease NET_TIMEOUT (default 180 secs.)
Be careful! - Not too low
New COMMITS
COMMIT IMMEDIATE WAIT (default)
COMMIT NOWAIT
COMMIT NOWAIT BATCH

19
All Redo Transport

Standby redo logs
Use fastest disks
No RAID5
Dont multiplex
Use the recommended number of SRLs
(maximum of online logfiles 1) maximum of
threads

20
Outline

Overview
Network Optimization
ARCn LGWR Redo Transport
Checkpoint, Redo Read/Apply Recovery
Wait Events
10g R2 11g Improvements
Best Practices

21
Checkpoint Phase

Checkpoint occurs
During log switch
LOG_CHECK_TIMEOUT expiration
LOG_CHECKOUT_INTERVAL reached
Reduce log switch interval
Resize redo log to 1GB - primary and secondary
Recommended - checkpoint every 15 minutes

22
Checkpoint Phase (Cont.)

Determine checkpoint frequency
COL NAME FOR A35
SELECT NAME, VALUE, TO_CHAR(SYSDATE, HHMISS)
TIME
FROM VSYSSTAT WHERE NAME 'DBWR checkpoints'
NAME VALUE
TIME
----------------------------------- ----------
--------
DBWR checkpoints 264
081543
SQLgt /
NAME VALUE
TIME
----------------------------------- ----------
--------
DBWR checkpoints 267
083406

23
Redo Read (Secondary)

Obtain read rate for the standby redo log
SQLgt ALTER SYSTEM DUMP LOGFILE
'/u01/oradata/docprd/sredo01.log validate
System altered.
vi docprd_ora_3560.trc
Mon Mar 12 085952 2007
----- Redo read statistics for thread 1 -----
Read rate (ASYNC) 4527Kb in 0.58s gt 6.90
Mb/sec
Longest record 19Kb, moves 0/7586 (0)
Change moves 4340/18026 (24), moved 2Mb
Longest LWN 92Kb, moves 1/1365 (0), moved
0Mb
Last redo scn 0x0000.01272351 (19342161)

24
Redo Apply (Secondary)

Goal
Redo apply rate (secondary) gt Redo create rate
(primary)
Carefully consider enabling DB_BLOCK_CHECKING
LOW, MEDIUM and FULL options
Possible performance impact

25
Redo Apply (Cont.)

Determine Log Block Size (LEBSZ)
SELECT LEBSZ FROM XKCCLE WHERE ROWNUM1
Get recovery blocks - at least two snapshots
Managed Recovery Case
SELECT PROCESS, SEQUENCE, THREAD, block,
BLOCKS,
TO_CHAR(SYSDATE, 'DD-MON-YYYY HHMISS') time
from vMANAGED_STANDBY WHERE PROCESS'MRP0'
Determine the recovery rate (MB/sec) for a
specific archive sequence number
Managed Recovery Case
((BLOCK_END - BLOCK_BEG) LOG_BLOCK_SIZE) /
(TIME_END - TIME_BEG) 1024 1024

26
Redo Apply (Cont.)

Oracle Recommends

27
Recovery

Parallel Recovery (before 10.1.0.5)
Set to number of CPUs
recover managed standby database parallel ltgt
PARALLEL_EXECUTION_MESSAGE_SIZE
Can increase to 4096 or 8192
Uses additional shared pool memory
Problems if set too high
DB_CACHE_SIZE
Can set secondary DB_CACHE_SIZE gt primary
Must set to primary before changing roles

28
Outline

Overview
Network Optimization
ARCn LGWR Redo Transport
Checkpoint, Redo Read/Apply Recovery
Wait Events
10g R2 11g Improvements
Best Practices

29
Arch Wait Events - Primary

ARCH wait on ATTACH
Time for all arch processes to spawn RFS
connection
ARCH wait on SENDREQ
Time for all arch processes to write received
redo to disk open close remote archived redo
logs
ARCH wait on DETACH
Time for all arch processes to delete RFS
connection

30
LGWR SYNC Wait Events - Primary

LGWR wait on ATTACH
Time for all log writer processes to spawn RFS
connection
LGWR wait on SENDREQ
Time for all log writer processes to write
received redo to disk open close the remote
archived redo logs
LGWR wait on DETACH
Time for all log writer processes to delete RFS
conn.

31
LGWR ASYNC Wait Events - Primary

LNS wait on ATTACH
Time for all network servers to spawn RFS
connection
LNS wait on SENDREQ
Time for all network servers to write received
redo to disk open close the remote archived
redo logs
LNS wait on DETACH
Time for all network servers to delete RFS conn.
LGWR wait on full LNS buffer
Time for log writer (LGWR) process awaiting for
network server (LNS) to free ASYNC buffer space

32
Wait Events on Secondary

RFS Write
Time to write to standby redo log or archive log
non I/O work like redo block checksum
validation
RFS Random I/O
Time to write to a standby redo log to occur
RFS Sequential I/O
Time to write to an archive log to occur

33
Outline

Overview
Network Optimization
ARCn LGWR Redo Transport
Checkpoint, Redo Read/Apply Recovery
Wait Events
10g R2 11g Improvements
Best Practices

34
10g R2 Improvements

Multiple archive processes can transmit a redo
log in parallel to the standby database
MAX_CONNECTIONS attribute of the
LOG_ARCHIVE_DEST_n controls the number of these
processes
Parallel Recovery for Redo apply is automatically
set equal to number of CPUs
10.1.0.5 and 10.2.0.1
Fast-Start Failover
Automatically fails over to a previously chosen
physical standby database

35
10g R2 Improvements (Cont.)

LGWR ASYNC
Uses a new process (LNSn) to transmit the redo
data directly from the online redo log to the
standby database
Physical standby database flashback
Can flash back temporarily for reporting
Logical standby database
Automatically deletes applied archived log
RMAN
Automatically creates temp datafiles after
recovery

36
11g Improvements

Physical standby database open read/write for
test or other purposes with zero compromise in
data protection using new Snapshot Standby
Automatic failover configurable for immediate
response to designated events or errors
More flexibility in primary/standby
configurations
e.g. Windows primary and Linux standby
Rolling upgrade options now in physical standby
with Transient Logical Standby
ASYNC transport enhanced to eliminate the impact
of latency on network throughput

37
11g Improvements (Cont.)

Fast detection of corruptions caused by lost
writes in the storage layer
SQL Apply supports XML data type (CLOB)
Many performance, manageability, and security
enhancements
Support for new Oracle Database 11g Options
Oracle Active Data Guard and Oracle Advanced
Compression
Fast Start Failover now available for Maximum
Performance mode

38
Outline

Overview
Network Optimization
ARCn LGWR Redo Transport
Checkpoint, Redo Read/Apply Recovery
Wait Events
10g R2 11g Improvements
Best Practices

39
Best Practices

Geographically separate primary standby DB
Ensure standby hardware configuration same as the
primary
Tune standby for write intensive operations
Test Data Guard before deploy in production
Set standard OS and DB parameters to recommended
values
Perform switchover testing
Fully document a failover procedure
Use FORCE LOGGING mode

40
Best Practices (Cont.)

Use real-time apply
Use the Data Guard Broker
Enable Flashback Database on both primary and
secondary databases
Evaluate using AFFIRM attribute
Possible performance issues on primary
Verify Asynchronous I/O enabled
Carefully consider DB_BLOCK_CHECKING

41
Best Practices (Cont.)

Dont multiplex standby redo logs (SRLs)
Correctly set number of SRLs
Increase PARALLEL_EXECUTION_MESSAGE_SIZE
Place SRLs in fast disk group or disks
Use at lease two standby DBs with Maximum
Protection Mode
Utilize COMMIT NOWAIT if appropriate

42
Best Practices (Cont.)

Ensure appropriate bandwidth between primary and
secondary
Increase default send receive queue sizes
TXQUEUELENGTH
NET_DEV_MAX_BACKLOG
Session Data Unit
Adjust value to 32767
Improvement during large data transmissions

43
Questions?

Lots of things we didnt cover
If we dont cover something you wanted to hear,
please contact me.

44
Session 126 Data Guard Best Practices Tuning
THANK YOU!
Please fill out evaluations!
Email Tom Canty tom.canty_at_servercare.com
Or Call 888-918-6309 http//www.ServerCare.com

Write a Comment

User Comments (0)

About PowerShow.com

Thomas E' Canty ServerCare, Inc' Session PowerPoint PPT Presentation