DM216 The Three Ps of Replication in a High Availability Environment - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

DM216 The Three Ps of Replication in a High Availability Environment

Description:

R4 to R2 W1CP (in more detail) W1. NJ. W1. PA. DCW1. DEW1. MDW1. VAW1. WVW1. W1DC. W1DE. W1MD ... Flat files to map database names and replication flow. rep_env. ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 47
Provided by: sybas
Category:

less

Transcript and Presenter's Notes

Title: DM216 The Three Ps of Replication in a High Availability Environment


1
DM216The Three Ps of Replication in a High
Availability Environment
Tony T. Sheng Technical Team Lead PLW Data
Architecture Bell Atlantic tony.t.sheng_at_bellatlant
ic.com
2
The Three Ps of Replication in a High
Availability Environment
  • Who am I?
  • Overview of Configuration
  • Best Practices
  • Pitfalls
  • The Principles
  • Q A

3
The Three Ps of Replication in a High
Availability Environment
  • Who am I?
  • Tech Lead for a DBA team
  • ASE/Rep Server administration for DEV/TEST
  • Development of processes for schema changes and
    custom data conversion to support application
  • Infrastructure and configuration design
  • Production Support for poor performance
  • Around 5 years of experience with Sybase products

4
Configuration Overview
  • Overview of PLW/CW configuration
  • Homogenous dataservers
  • Bi directional replication
  • No stored procedures, no function strings yet
  • No routes, one RS serves PDB-gtRDB
  • No Sybase Warm Standby

5
Configuration Overview
  • R4 HA

6
Configuration Overview
  • R4 to R2 W1CP

7
Configuration Overview
  • R4 to R2 W1CP (in more detail)

8
Configuration Overview
  • CW

9
Configuration Overview
  • Overview of Configuration
  • Why is this so confusing?
  • TUXEDO, XA Server for 2-phase commit
  • Application architecture start up is too long
  • Can use the Rep Server to queue for maintenance
    and availability
  • Alternative to System 12
  • Wanted DBAs to be disconnected from the failover

10
Best Practices
  • Design
  • Rep Server
  • On another host to get data out of the Rep Agent
    fast
  • Its own host to reduce network/CPU bottleneck
  • Lots of disk so you dont run out of SD
  • Hold lots of transactions in the SD if you need
    to
  • Data Model
  • Login and timestamp field on every table
  • Could be used in the future for conflict
    resolution

11
Best Practices
  • Design
  • Data Server
  • Hardware layout and software configuration all to
    accommodate more users in case of failover
  • Each database resides on its own set of disks, no
    controller/disk contention
  • Use 11.5 Rep Agent and Rep Server
  • Hold lots of days in transaction logs if you need
    to

12
Best Practices
  • Standards
  • Data Definition Language
  • Each table has unique index and primary key that
    are relevant (used for auto generation of
    repdefs)
  • Object name allows generation of a repdef that is
    less than 30 characters
  • UNIX host and Dataserver naming standards
  • Each host has a isXXXXXX name
  • Each ASE servers is a SQLNAAXXXXXX

13
Best Practices
  • Standards
  • UNIX host and Data Server naming standards
  • Each host has a isXXXXXX name
  • is001277
  • Each ASE servers is a SQLNAAXXXXXX
  • SQL1LW001277, SQL1CW002147
  • Can have a backupserver, XA Server and Rep Server
  • REP1LW001277, REP2LW001277, REP1CW002147
  • SYB_BACKUP1LW001277

14
Best Practices
  • Standards
  • Replication Objects
  • Repdefs - rmachinedatabaseobject
  • r1001931w1mdtlb
  • Subscriptions - smachinedatabaseobject
  • s1001929w1mdtlb
  • Function strings names would also be standard

15
Best Practices
  • Standards
  • Every Sybase server is configured identically.
  • Filesystems, partitions, disk names, database
    names
  • Configuration changes between the type of
    application, ie. sqljumbo, replarge, etc.
  • Flat File Site Specific Information (SSI)
  • Flat files to map database names and replication
    flow
  • rep_env.hostname
  • PRSPDSPDBRRSRDSRDB

16
Best Practices
  • Standards
  • Allows getting a quick diagnosis
  • Assists in getting a quick handle on replication
    flow
  • Assists to drill down to specific objects that
    have issues
  • The key is to recover quickly from problems

17
Best Practices
  • Operational Procedures
  • Who updates where ?
  • Replication is on the whole time and is
    bidirectional
  • Transactions executed by the maintenance user are
    not replicated
  • (see send_maint_xacts_to_replicate under
    sp_config_rep_agent)

18
Best Practices
  • Operational Procedures
  • Who updates where ?
  • Application user groups are GRANTed/REVOKEd
    permission based upon operating state (normal
    versus failover)
  • One dataserver is always considered the REAL ONE,
    data on this server can be assumed to be correct

19
Best Practices
  • Operational Procedures
  • Constraints
  • Used in the data model to keep integrity of the
    data
  • Can bring the Rep Server thread down because of
    the sequence of the SQL
  • We have been having OPS drop them on the
    replicate side when necessary
  • Probably should include a DROP/ADD constraint
    process during the failover

20
Best Practices
  • Operational Procedures
  • Autocorrection
  • Can help get rid of duplicate key errors that are
    stuck in the queue when synching the data.
  • Cant be used with Replicate Minimal Columns
    (which we dont do but probably should)

21
Best Practices
  • Monitor
  • You must monitor!!
  • Rep Agent and Transaction Logs
  • Ran into some bugs with Rep Agent hanging
  • Check that the transaction log dumps are not
    growing in size
  • This would infer that the Rep Agent is not moving
    through the transaction log and moving the
    secondary truncation pointer

22
Best Practices
  • Monitor
  • Replication Server DSI Threads
  • Had a few days where the thread(s) were down (OPS
    error)
  • Takes time to recover from this to drain the
    queue
  • Vulnerable if failover is needed. You would have
    the potential for in flight transactions. (BIG
    headache)

23
Best Practices
  • Monitor
  • Some kind of replication heartbeat
  • rep_verify process
  • Heartbeat table every 10 minutes

24
Best Practices
  • Monitor
  • Extra Monitoring
  • Rep Server queue throughput
  • Difference between first and last block every
    hour
  • Capture trends for load, additional data, peak
    volumes, future capacity

25
Best Practices
  • Monitor
  • Rep Server queue throughput

26
Best Practices
  • Monitor
  • Other various monitoring
  • Latency
  • Have not noticed too much latency but could
    assimilate data with queue throughput
  • rs_lastcommit timestamp on replicate database -
    the last time an update flowed through
  • select getdate(), db_name, origin,
    datediff(ss,origin_time,dest_commit_time)from
    rs_lastcommit where origingt0 and origin_time gt
    "Jan 1 1900"

27
Best Practices
  • Monitor
  • Other various monitoring
  • Aggregate information for our rollup replication

89354 ---- md 1449 ---- wv 328029 ----
pa 63383 ---- dc 25255 ---- de 126779 ----
va 98929 ---- nj 733178
SQL1R2051972 -----------
733178 SQL1R2051973 ----------- 733178
28
Best Practices
  • Tools and Utilities
  • droprep.sh DBNAME ALL
  • Utility to drop all replication components for a
    database or all databases on the dataserver
  • Large schema changes when it is easier to rebuild
    all replication
  • Drops subs/repdefs
  • Shutdown Rep Agents and disable
  • Remove secondary truncation point (most important)

29
Best Practices
  • Tools and Utilities
  • Standard process to build replication
  • Establish connection for PDB/RDB
  • Turn on Rep Agent for PDB
  • Build repdefs/subs
  • Set replicate bit
  • Grant permissions to maintenance user

30
Best Practices
  • Tools and Utilities
  • REPON/REPOFF.sh
  • Utility to temporarily turn off and on
    replication
  • Shutdown Rep Agents and disable secondary
    truncation point
  • Resets truncation point, turns on Rep Agent

31
Best Practices
  • Tools and Utilities
  • dump/rcp/load
  • Set of scripts to synch up databases from primary
    to replicate
  • Make sure no users on the replicate so the
    database can be loaded
  • Must reset generation id, reset truncation point,
    reconfigure Rep Agent to the correct Rep Server
    (two way)

32
Best Practices
  • Tools and Utilities
  • REPdelta
  • Process which adds/drops/synchs replication
    objects to database schema changes
  • Reads from flat files that have been generated by
    data model process

33
Pitfalls
  • Recovering Data from Operational Errors
  • Queue was down and a failover was executed
  • Have to figure out what updates were held at the
    PDB and what updates were executed at the RDB.
    Timestamp field can help here.

34
Pitfalls
  • Application Doing Very Bad Things
  • A HUGE Transaction
  • One large update took more than 75000 locks on
    the replicate database.
  • Rep Server would open the transaction, flow, get
    an error for insufficient locks, rollback the
    transaction and then try again (for 36 hours)
  • Had to shutdown the RDS and reconfigure the
    number of locks and let the Rep Server replicate
    the transaction
  • Got lucky because the RDS was not in use

35
Pitfalls
  • Application Doing Very Bad Things
  • Moving the Primary Key of a Unique Index Column
    to another value
  • This is a documented restriction.
  • Causes the DSI thread to go down because the
    order of the transaction causes a duplicate key
    violation
  • Must skip the transaction
  • see Chapter 8 of the Rep Server Admin Guide

36
Pitfalls
  • Application Doing Very Bad Things
  • Moving the Primary Key of a Unique Index Column
    to another value

Table cu_ln clustered unique index is on cu_f,
cu_ln_f update cu_ln set a.cu_ln_f (select
a.cu_ln_f max(b.cu_ln_f) from cu_ln b
where b.cu_f 23732123) from cu_ln a where
a.cu_f 982323433 go
37
Pitfalls
  • Application Doing Very Bad Things
  • Moving the Primary Key of a Unique Index Column
    to another value
  • the workaround involves making all the updates to
    a temp table and then inserting them and deleting
    the old rows

1. select into getoldculn from cu_ln where
cu_f OLD 2. update getoldculn set cu_ln_f
(select cu_ln_f max(cu_ln_f) from getoldculn
where cu_f OLD 3. insert into cu_ln select
from getoldculn where cu_f OLD
38
Pitfalls
  • DBA functions
  • Tried to move the logsegment off a device onto
    another device using sp_dropsegment and
    sp_extendsegment
  • alter database xlvww1ma log on SYBLDEV26 500
  • go
  • use xlvww1ma
  • go
  • sp_dropsegment "logsegment", "xlvww1ma","SYBLDEV04
    "
  • go
  • sp_extendsegment "default", "xlvww1ma","SYBLDEV04"
  • go

39
Pitfalls
  • DBA functions
  • Tried to move the logsegment off a device onto
    another device using sp_dropsegment and
    sp_extendsegment
  • Ended up getting 605 and 692 errors on the log
    segment of the database.
  • Should have turned off the Rep Agent before any
    of this work.
  • Recovered by reloading the databases. (painful)

40
Pitfalls
  • DBA functions
  • Bad constraint added manually
  • Had a DSI thread that was up but no data would
    flow.
  • Maintenance user spid IOs never increased.
  • Queue would drain when autocorrection was turned
    on
  • Found a bad constraint on the replicate database
    (most likely put in by manual process
  • C_AD_gssl_adr_rng_1
  • adr FOREIGN KEY (adr_f) SELF REFERENCES
    adr(adr_f)

41
Pitfalls
  • Synching databases
  • You will probably have to synch your data
    periodically
  • Skipped transactions
  • Incorrect failover
  • Make this a documented standard procedure so it
    is readily available.
  • Use autocorrection to avoid the first few
    duplicate key errors from data that is in the log

42
The Principles
  • A Replication Lab
  • Have a restricted development lab that mimics
    production
  • Owned by DBAs, no other teams have access
  • Same DS/RS configuration
  • Number of databases, Rep Servers, rs_config,
    sp_configure
  • Used to isolate replication problems
  • Mock deploy replication fixes
  • Test out new replication topologies
  • Recreate production problems

43
The Principles
  • Send all SQL through the SW lifecycle
  • When we get burned, 90 is from SQL that didnt
    go through the software lifecycle.
  • Be stringent about application transaction size
  • every SQL is reviewed by DBA team for
  • Performance standards (valid SARGs, good
    showplan, etc.)
  • Replication Server standards
  • (no update table set col1T where col1R) -gt
    sends into a big update transaction on the
    replicate

44
The Principles
  • More Principles We Have Learned
  • Have good standards
  • Assist you in debugging quickly
  • Start simple with Rep Server
  • Monitor, Monitor, Monitor
  • Know your Operating Procedures
  • Step by Step of failover/failback process

45
What Is Next?
  • Replicating data through different schemas
  • combination of function strings/operational steps
    depending on the type of change
  • Configuration that is similar to System 12
  • move to hot/cold configuration where the disks
    are shared and switched
  • no Rep Server for local failover but still for
    DR mode data replication

46
Q A
Write a Comment
User Comments (0)
About PowerShow.com