High Availability and Disaster Recovery - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

High Availability and Disaster Recovery

Description:

High Availability and Disaster Recovery Considerations and Options Transactional Data Management Solutions Agenda Introduction High Availability - 2006 Industry Shift ... – PowerPoint PPT presentation

Number of Views:363

Avg rating:3.0/5.0

Slides: 29

Provided by: SamiA4

Category:

more less

Transcript and Presenter's Notes

Title: High Availability and Disaster Recovery

1
High Availability and Disaster Recovery

Considerations and Options
Transactional Data Management Solutions

2
Agenda

Introduction
High Availability - 2006
Industry Shift from MTTF to MTTR, Continuous
Availability
Challenges in HA environments
Understanding/Evaluating HA technologies
TDM HA Solutions
Questions Answers

3
About GoldenGate Software
GoldenGate Software is a privately held software
company that offers Transactional Data Management
solutions.
250 customers... 1500 solutions implemented in
35 countries
Established, Loyal Customer Base
Leading Industry Solutions
18,000 Node ATM Network with 24/7 Availability
Saving millions with real-time DW and zero
downtime migrations.
Database tiering handles average of 300,000
updates/hour, peaks at 800,000/hour
Achieving paperless enterprise for this visionary
healthcare provider
4
Speaker Introduction/Background

Nick Wagner
Director of Product Management, GoldenGate
Software
Transactional Data Management for Oracle and
other databases
8 years of Product Management, primarily focused
on Database Replication Solutions for High
Availability, Disaster Recovery, Reporting, and
Data Integration
5 Years Product Manager for Quest SharePlex for
Oracle

5
HA (2006)

Definition
Ratio of system uptime to sum of uptime and
downtime
Availability MTTF/(MTTFMTTR)
Challenges
Addressing Performance vs. Reliability in
computer systems
Hardware Faults, Software Bugs, Human errors are
realities in any complex system deployment
Enterprise applications need to function 24x7
Disasters are no longer a distant threat
Inadequate planning to handle outages
Shift in industry (and academic research) focus
From fault tolerance to reducing MTTR

6
The 3 States of Availability Systematic View
7
High Availability Concerns (No Outage)
1 Active
Throughput

Latency
DSS vs. OLTP
conflicting requirments
Mixed workload
Data validation
Data Transformation

Common Approaches

Add more
Nodes
Resources
Infrastructure

8
High Availability Concerns (Planned Outages)
Common Approaches

Selected windows of downtime
Phased approach to maintenance

9
Planned Outages - Upgrades/Migration challenges

Maintaining SLA during planned outage
Revenue Impact
Customer Expectations
Interdependencies, Integration

Data issues
Instantiating Terabytes/Petabytes
Staging areas
Change Management
Special Handling

Synchronization issues
Incremental data movement
Source database impact

Failback strategy
System/Application verification
Continued data growth

10
High Availability Concerns (Unplanned Outages)
3 Unplanned Outage
Common Approaches

Database Restore/Recovery
RAID
Shared Disk Clusters
Standby database

11
Unplanned outages - Understanding Database
Failures

Failure points
Statement
Process
Instance
Database
Site
Failure types
Physical (Media, corruption, inconsistency
amongst redundant copies)
Logical (Incorrect DML, out-of-synch, accidental
table drop)
Failure Handling
Automatic
Manual

12
Unplanned outages Repair as a focus

Mapping of symptoms to failure categories is
complex
Native repair solutions do not address complex or
multiple failures
Root cause analysis affects MTTR
Failover, isolated repair will replace
conventional recovery in computing environments

13
Evaluating HA Technologies

Availability
Is the Failover/DR solution available for real
use?
MTTR (RTO)
In the event of a failure, how soon can the data
be recovered?
Performance
Speed and support for high volumes
Data Loss (RPO)
What is the impact of an unplanned outage in
terms of lost data?
Zero downtime
Does the solution allow for zero downtime during
planned outages?
Manageability
Configuration, Install, Monitoring
Impact on deployed systems
How intrusive? What is the impact on data itself?
Cost
Licensing, maintenance

14
Differentiating HA Technologies

Conventional Backup/Recovery
RAID
multiple hard disks behaving as a single large
fast drive (mirrors/stripes/duplexing/parity)
Snapshots

Block Level Database Replication
Change Level Database Replication
Remote Mirroring Solutions
Transactional Data Management

High Availability and Disaster Recovery
Roll Forward / File Protection
15
HA Technologies Tradeoffs

Block based database replication
Standby kept in constant recovery (mount) mode
Useful for strict disaster recovery only, not HA
Cannot be used for reporting in recovery mode
No write access for distributed load balancing
Application response times suffer after failover
Cannot address availability across heterogeneous
systems
Change based database replication
Trigger or log based
Not optimized for real time performance
Intrusive, Complex
Cannot address availability across heterogeneous
systems

16
HA Technologies Tradeoffs

Remote mirroring solutions
Volume managers maintain mirrors of local writes
on a set of remote volumes
Useful for file protection
Physical distance to remote volumes is a critical
limitation
No protection from logical corruption, or storage
stack corruption
Message based logical writes sent by primary host
over IP to remote hosts (synchronously/asynchronou
sly)
Write ordering must be maintained by primary host
Remote volumes are standby-only, applications
cannot access them
No protection from logical corruption
Hardware based
Storage arrays propagate IOs to storage arrays at
a secondary site
Secondary arrays are inaccessible during
replication
No protection from logical corruption
Only useful for block availability during DR

17
Oracle Technologies Tradeoffs

RAC
Good for protection from system failures
Shared disk architecture can result in single
point-of-failure
Complex deployment, no protection from media
failure
Data Guard
Physical standby
Runs in inactive mode (mounted)
Cold cache increases MTTR from transactional
standpoint
Network latency (over SQLNet)
Media recovery process lags significantly during
heavy workloads
Logical standby
Redo/Archive logs shipped over the network to
standby site
Real time reporting, High throughput workloads
(9i limited support)
Vulnerable to data loss (9i)
RTA Performance impact on LGWR
Read Only access for data set being logically
protected

18
Oracle Technologies Tradeoffs

Streams
Good for information sharing in low to moderate
throughput environments
Allows Oracle databases to be on different
platforms
Limited support for datatypes in pre 10g release
Metadata managed within database
Requires custom application for capture from
non-Oracle database

19
HA Technologies Tradeoffs

Transactional Data Management
Addresses low-latency in HA hybrid computing
environments (built on 1 Safe protocol for
highest performance)
Management of transactional streams -- captures,
transforms, routes, delivers and verifies data
transactions in real time
Real time, heterogeneous, data integrity, low
impact
Use cases for HA, DR, data integration,
distributed computing
Not for file-level replication

20
How TDM Works Modular Building Blocks
Capture Committed changes are captured (and can
be filtered) as they occur by reading the
transaction logs.
Trail files Stages and queues data for routing.
Route Data is compressed, encrypted for routing
to targets.
Delivery Applies transactional data with
guaranteed integrity, transforming the data as
required.
Filtered Delivery
Filtered Capture
LAN / WAN / Internet
Source Database(s)
Target Database(s)
Manager
Manager
21
HA/DR Solution Examples
22
HA Configuration Multi-Master
Master

Bi-directional configuration dual-master for
load balancing, improved performance and
throughput
For
Highest Availability
Maximized ROI on hardware (transaction
balancing)
Example areas
24x7 (ATMs, Online Banking)
Online Retail

Active
Master
Active
23
HA Configuration Scalability

Improve scalability and performance of
transaction processing by offloading query load
to lower-cost databases/platforms
For
Horizontal scalability
Improved performance
Example areas
Online Reservations
Online Lookups

Writers
Active
Live/Active
Readers (Lookup Query Handling)
24
HA Configuration Disaster Tolerance
Database
Active

An HA implementation that captures and applies
data to a failover system in real time.
For
Fast failover (No restore)
Do root-cause analysis later!
Surgical Repair (Dynamic, Selective undo)
Example areas
24x7/mission-critical applications
Strict SLA requirements

Unplanned outage
System failure
Data failure
Failover Database
25
HA Configuration Switchover
Current Database

Zero-Downtime Migrations
Rolling Upgrades
Zero-Downtime Maintenance
Failback contingencies
For
24x7 availability
Reduced windows for system maintenance
Example areas
Cant afford downtime to do in-place upgrade

Planned Outage
Switchover Database
26
TDM HA Evaluation Criteria
Availability Not just disaster recovery but also continuous operations
MTTR Immediately available and up-to-date secondary system with MTTR of a few seconds
Performance Near zero time latency Ship only committed transactions
Zero Downtime for planned outages Downtime restricted to application switchover
Data Protection / Loss Redo validation using SQL Apply No Loss (db read access to last IO in current log)
Manageability Director GUI, CLI, STATS
Impact Low impact on deployed systems Metadata outside the database
27
Thank You. QA

Nick Wagner nwagner_at_goldengate.com 415-369-4261
28
Contributions