DM216 The Three Ps of Replication in a High Availability Environment

About This Presentation

Title:

DM216 The Three Ps of Replication in a High Availability Environment

Description:

R4 to R2 W1CP (in more detail) W1. NJ. W1. PA. DCW1. DEW1. MDW1. VAW1. WVW1. W1DC. W1DE. W1MD ... Flat files to map database names and replication flow. rep_env. ... – PowerPoint PPT presentation

Number of Views:80

Avg rating:3.0/5.0

Slides: 47

Provided by: sybas

Category:

more less

Transcript and Presenter's Notes

Title: DM216 The Three Ps of Replication in a High Availability Environment

1
DM216The Three Ps of Replication in a High
Availability Environment
Tony T. Sheng Technical Team Lead PLW Data
Architecture Bell Atlantic tony.t.sheng_at_bellatlant
ic.com
2
The Three Ps of Replication in a High
Availability Environment

Who am I?
Overview of Configuration
Best Practices
Pitfalls
The Principles
Q A

3
The Three Ps of Replication in a High
Availability Environment

Who am I?
Tech Lead for a DBA team
ASE/Rep Server administration for DEV/TEST
Development of processes for schema changes and
custom data conversion to support application
Infrastructure and configuration design
Production Support for poor performance
Around 5 years of experience with Sybase products

4
Configuration Overview

Overview of PLW/CW configuration
Homogenous dataservers
Bi directional replication
No stored procedures, no function strings yet
No routes, one RS serves PDB-gtRDB
No Sybase Warm Standby

5
Configuration Overview

R4 HA

6
Configuration Overview

R4 to R2 W1CP

7
Configuration Overview

R4 to R2 W1CP (in more detail)

8
Configuration Overview

9
Configuration Overview

Overview of Configuration
Why is this so confusing?
TUXEDO, XA Server for 2-phase commit
Application architecture start up is too long
Can use the Rep Server to queue for maintenance
and availability
Alternative to System 12
Wanted DBAs to be disconnected from the failover

10
Best Practices

Design
Rep Server
On another host to get data out of the Rep Agent
fast
Its own host to reduce network/CPU bottleneck
Lots of disk so you dont run out of SD
Hold lots of transactions in the SD if you need
to
Data Model
Login and timestamp field on every table
Could be used in the future for conflict
resolution

11
Best Practices

Design
Data Server
Hardware layout and software configuration all to
accommodate more users in case of failover
Each database resides on its own set of disks, no
controller/disk contention
Use 11.5 Rep Agent and Rep Server
Hold lots of days in transaction logs if you need
to

12
Best Practices

Standards
Data Definition Language
Each table has unique index and primary key that
are relevant (used for auto generation of
repdefs)
Object name allows generation of a repdef that is
less than 30 characters
UNIX host and Dataserver naming standards
Each host has a isXXXXXX name
Each ASE servers is a SQLNAAXXXXXX

13
Best Practices

Standards
UNIX host and Data Server naming standards
Each host has a isXXXXXX name
is001277
Each ASE servers is a SQLNAAXXXXXX
SQL1LW001277, SQL1CW002147
Can have a backupserver, XA Server and Rep Server
REP1LW001277, REP2LW001277, REP1CW002147
SYB_BACKUP1LW001277

14
Best Practices

Standards
Replication Objects
Repdefs - rmachinedatabaseobject
r1001931w1mdtlb
Subscriptions - smachinedatabaseobject
s1001929w1mdtlb
Function strings names would also be standard

15
Best Practices

Standards
Every Sybase server is configured identically.
Filesystems, partitions, disk names, database
names
Configuration changes between the type of
application, ie. sqljumbo, replarge, etc.
Flat File Site Specific Information (SSI)
Flat files to map database names and replication
flow
rep_env.hostname
PRSPDSPDBRRSRDSRDB

16
Best Practices

Standards
Allows getting a quick diagnosis
Assists in getting a quick handle on replication
flow
Assists to drill down to specific objects that
have issues
The key is to recover quickly from problems

17
Best Practices

Operational Procedures
Who updates where ?
Replication is on the whole time and is
bidirectional
Transactions executed by the maintenance user are
not replicated
(see send_maint_xacts_to_replicate under
sp_config_rep_agent)

18
Best Practices

Operational Procedures
Who updates where ?
Application user groups are GRANTed/REVOKEd
permission based upon operating state (normal
versus failover)
One dataserver is always considered the REAL ONE,
data on this server can be assumed to be correct

19
Best Practices

Operational Procedures
Constraints
Used in the data model to keep integrity of the
data
Can bring the Rep Server thread down because of
the sequence of the SQL
We have been having OPS drop them on the
replicate side when necessary
Probably should include a DROP/ADD constraint
process during the failover

20
Best Practices

Operational Procedures
Autocorrection
Can help get rid of duplicate key errors that are
stuck in the queue when synching the data.
Cant be used with Replicate Minimal Columns
(which we dont do but probably should)

21
Best Practices

Monitor
You must monitor!!
Rep Agent and Transaction Logs
Ran into some bugs with Rep Agent hanging
Check that the transaction log dumps are not
growing in size
This would infer that the Rep Agent is not moving
through the transaction log and moving the
secondary truncation pointer

22
Best Practices

Monitor
Replication Server DSI Threads
Had a few days where the thread(s) were down (OPS
error)
Takes time to recover from this to drain the
queue
Vulnerable if failover is needed. You would have
the potential for in flight transactions. (BIG
headache)

23
Best Practices

Monitor
Some kind of replication heartbeat
rep_verify process
Heartbeat table every 10 minutes

24
Best Practices

Monitor
Extra Monitoring
Rep Server queue throughput
Difference between first and last block every
hour
Capture trends for load, additional data, peak
volumes, future capacity

25
Best Practices

Monitor
Rep Server queue throughput

26
Best Practices

Monitor
Other various monitoring
Latency
Have not noticed too much latency but could
assimilate data with queue throughput
rs_lastcommit timestamp on replicate database -
the last time an update flowed through
select getdate(), db_name, origin,
datediff(ss,origin_time,dest_commit_time)from
rs_lastcommit where origingt0 and origin_time gt
"Jan 1 1900"

27
Best Practices

Monitor
Other various monitoring
Aggregate information for our rollup replication

89354 ---- md 1449 ---- wv 328029 ----
pa 63383 ---- dc 25255 ---- de 126779 ----
va 98929 ---- nj 733178
SQL1R2051972 -----------
733178 SQL1R2051973 ----------- 733178
28
Best Practices

Tools and Utilities
droprep.sh DBNAME ALL
Utility to drop all replication components for a
database or all databases on the dataserver
Large schema changes when it is easier to rebuild
all replication
Drops subs/repdefs
Shutdown Rep Agents and disable
Remove secondary truncation point (most important)

29
Best Practices

Tools and Utilities
Standard process to build replication
Establish connection for PDB/RDB
Turn on Rep Agent for PDB
Build repdefs/subs
Set replicate bit
Grant permissions to maintenance user

30
Best Practices

Tools and Utilities
REPON/REPOFF.sh
Utility to temporarily turn off and on
replication
Shutdown Rep Agents and disable secondary
truncation point
Resets truncation point, turns on Rep Agent

31
Best Practices

Tools and Utilities
dump/rcp/load
Set of scripts to synch up databases from primary
to replicate
Make sure no users on the replicate so the
database can be loaded
Must reset generation id, reset truncation point,
reconfigure Rep Agent to the correct Rep Server
(two way)

32
Best Practices

Tools and Utilities
REPdelta
Process which adds/drops/synchs replication
objects to database schema changes
Reads from flat files that have been generated by
data model process

33
Pitfalls

Recovering Data from Operational Errors
Queue was down and a failover was executed
Have to figure out what updates were held at the
PDB and what updates were executed at the RDB.
Timestamp field can help here.

34
Pitfalls

Application Doing Very Bad Things
A HUGE Transaction
One large update took more than 75000 locks on
the replicate database.
Rep Server would open the transaction, flow, get
an error for insufficient locks, rollback the
transaction and then try again (for 36 hours)
Had to shutdown the RDS and reconfigure the
number of locks and let the Rep Server replicate
the transaction
Got lucky because the RDS was not in use

35
Pitfalls

Application Doing Very Bad Things
Moving the Primary Key of a Unique Index Column
to another value
This is a documented restriction.
Causes the DSI thread to go down because the
order of the transaction causes a duplicate key
violation
Must skip the transaction
see Chapter 8 of the Rep Server Admin Guide

36
Pitfalls

Application Doing Very Bad Things
Moving the Primary Key of a Unique Index Column
to another value

Table cu_ln clustered unique index is on cu_f,
cu_ln_f update cu_ln set a.cu_ln_f (select
a.cu_ln_f max(b.cu_ln_f) from cu_ln b
where b.cu_f 23732123) from cu_ln a where
a.cu_f 982323433 go
37
Pitfalls

Application Doing Very Bad Things
Moving the Primary Key of a Unique Index Column
to another value
the workaround involves making all the updates to
a temp table and then inserting them and deleting
the old rows

1. select into getoldculn from cu_ln where
cu_f OLD 2. update getoldculn set cu_ln_f
(select cu_ln_f max(cu_ln_f) from getoldculn
where cu_f OLD 3. insert into cu_ln select
from getoldculn where cu_f OLD
38
Pitfalls

DBA functions
Tried to move the logsegment off a device onto
another device using sp_dropsegment and
sp_extendsegment
alter database xlvww1ma log on SYBLDEV26 500
go
use xlvww1ma
go
sp_dropsegment "logsegment", "xlvww1ma","SYBLDEV04
"
go
sp_extendsegment "default", "xlvww1ma","SYBLDEV04"
go

39
Pitfalls

DBA functions
Tried to move the logsegment off a device onto
another device using sp_dropsegment and
sp_extendsegment
Ended up getting 605 and 692 errors on the log
segment of the database.
Should have turned off the Rep Agent before any
of this work.
Recovered by reloading the databases. (painful)

40
Pitfalls

DBA functions
Bad constraint added manually
Had a DSI thread that was up but no data would
flow.
Maintenance user spid IOs never increased.
Queue would drain when autocorrection was turned
on
Found a bad constraint on the replicate database
(most likely put in by manual process
C_AD_gssl_adr_rng_1
adr FOREIGN KEY (adr_f) SELF REFERENCES
adr(adr_f)

41
Pitfalls

Synching databases
You will probably have to synch your data
periodically
Skipped transactions
Incorrect failover
Make this a documented standard procedure so it
is readily available.
Use autocorrection to avoid the first few
duplicate key errors from data that is in the log

42
The Principles

A Replication Lab
Have a restricted development lab that mimics
production
Owned by DBAs, no other teams have access
Same DS/RS configuration
Number of databases, Rep Servers, rs_config,
sp_configure
Used to isolate replication problems
Mock deploy replication fixes
Test out new replication topologies
Recreate production problems

43
The Principles

Send all SQL through the SW lifecycle
When we get burned, 90 is from SQL that didnt
go through the software lifecycle.
Be stringent about application transaction size
every SQL is reviewed by DBA team for
Performance standards (valid SARGs, good
showplan, etc.)
Replication Server standards
(no update table set col1T where col1R) -gt
sends into a big update transaction on the
replicate

44
The Principles

More Principles We Have Learned
Have good standards
Assist you in debugging quickly
Start simple with Rep Server
Monitor, Monitor, Monitor
Know your Operating Procedures
Step by Step of failover/failback process

45
What Is Next?

Replicating data through different schemas
combination of function strings/operational steps
depending on the type of change
Configuration that is similar to System 12
move to hot/cold configuration where the disks
are shared and switched
no Rep Server for local failover but still for
DR mode data replication

46
Q A

Write a Comment

User Comments (0)

About PowerShow.com

DM216 The Three Ps of Replication in a High Availability Environment - PowerPoint PPT Presentation

DM216 The Three Ps of Replication in a High Availability Environment

R4 to R2 W1CP (in more detail) W1. NJ. W1. PA. DCW1. DEW1. MDW1. VAW1. WVW1. W1DC. W1DE. W1MD ... Flat files to map database names and replication flow. rep_env. ... – PowerPoint PPT presentation