Microsoft SQL Server High Availability and Disaster Recovery - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Microsoft SQL Server High Availability and Disaster Recovery

Description:

Manual intervention required to restore offline data copy. Backup and restore. Warm standby ... Backup and Restore. Database. Backup. Log Shipping. Primary ... – PowerPoint PPT presentation

Number of Views:3279
Avg rating:5.0/5.0
Slides: 50
Provided by: michael1059
Category:

less

Transcript and Presenter's Notes

Title: Microsoft SQL Server High Availability and Disaster Recovery


1
Microsoft SQL Server High Availabilityand
Disaster Recovery
  • Michael Poremba // October 2008

2
Database HA DR Experience
  • Work with business to determine HA or DR
    requirements for applications and data?
  • Design HA or DR solutions?
  • Administer HA or DR process?
  • Still learning MS SQL Server HA DR capabilities?

3
Scope of this Presentation
  • Presentation Focus
  • Beyond Scope of Presentation
  • Data Availability
  • Data recovery
  • High availability
  • Disaster recovery
  • Technology Focus
  • MS SQL Server
  • Physical servers
  • SANs
  • In-depth how-to(available elsewhere)
  • Partitioned views (federated)
  • Advanced DBA techniques
  • Custom application logic
  • 3rd-party software solutions
  • Alternate DBMS engines(e.g. Oracle DB2)
  • HA on virtual machines
  • Complex scenarios solutions
  • Load balancing

4
Introduction to Data Availability
  • So, you need to make yourproduction database
    bulletproof

5
Data Availability Continuum
  • Degrees of protection for information systems

6
Business Case for Availability
  • High Availability
  • Disaster Recovery
  • Keep business-critical applications available
  • Secondary
  • Server maintenance
  • Protect against loss of data center
  • Secondary
  • Application upgrades
  • Infrastructure upgrades

7
Service Level Agreement (SLA)
  • Permitted downtime (planned vs. unplanned?)
  • Acceptable data/transaction loss
  • Application response times
  • Mean time to recovery
  • Note Database uptime is not equivalent to
    application availability
  • Failures of other application services
  • Network outages

8
Protect What?
  • Application data stores
  • Databases
  • Files
  • Other data repositories
  • Database services
  • DBMS availability for applications
  • Application services
  • Application availability for users and external
    systems
  • Databases are the heart of most information
    systemsthey deserve the highest affordable
    protection.

9
Database Failure Scenarios
  • Physical Infrastructure Failures
  • Logical Data Failures
  • Storage subsystem
  • Disk
  • Controller
  • Network
  • Server
  • Power
  • Operator errors
  • DBMS interruption
  • Drops / deletes
  • Application defects
  • DBMS defects
  • Data corruption

10
Service Recovery Strategies
11
Data RecoveryTerminology
  • Terminology varies for source vs. copy

12
Data Recovery
  • Briefly

13
Database Backups
  • Traditional backup types
  • Full backup
  • Differential backup
  • Transaction log backup
  • Disk is better than tape
  • First backup to disk (separate physical disk
    volume)
  • Detect exceptions encountered during backup
  • Verify backup files
  • Copy backup files to tape or remote disk
  • Data retention policy for backup files

14
Database Backup Strategy
  • Backup of user databases not sufficient for
    recovery
  • System database
  • Master database
  • MSDB database
  • Model database
  • External data stores

15
Synch with External Data Stores
  • Synchronize recovered database with external data
    stores
  • Identity column seeds
  • Full-text indexes(SQL Server 2000)
  • LDAP entries
  • File system objects
  • Other databases

16
Backup Retention Policy
  • Location of backup files
  • Duration of retention
  • Protection of sensitive data
  • Sarbanes/Oxley (SOX)
  • HIPAA
  • Internal policies for data management and
    protection
  • Access to backups from offsite data storage

17
Data Recovery Process
  • Backup file sets
  • Full baseline, differential, and transaction logs
  • Retrieving backup files
  • Offsite storage
  • Tape
  • Network copy
  • Dependency on multiple people to get access to
    backup files
  • Recovery strategy depends on failure scenario
  • Create comprehensive failure matrix
  • Devise recovery strategy for each scenario
  • Does worst-case recovery scenario fit within SLA
    parameters?
  • Recovery time SLA
  • Include future data growth in recovery plan
  • Fully test recovery strategiespractice is
    essential

18
High Availability
19
High Availability
  • Minimize or avoid service downtime
  • Whether planned or unplanned
  • When components fail,service interruption is
    brief or non-existent
  • Automatic failover
  • Eliminate single points of failure (as
    affordable)
  • Redundant components
  • Fault-tolerant servers

20
Redundant Components
  • Objective Avoid single points of failure (where
    affordable)
  • Approach Use redundant components for database
    service
  • Database server nodes
  • Server components
  • ECC RAM failure-tolerant HW OS
  • DBMS instance
  • User databases
  • Storage devices
  • Storage unit components
  • MPIO Interfaces paths switches controllers
  • RAID Disks
  • Networking
  • MPIO Interfaces paths switches
  • Data copies
  • E.g. Recovering torn page from mirror in SQL
    Server 2008

21
Transaction Log Shipping
  • Warm standby solution
  • Duplicate user database
  • Copy transaction logs to standby server restore
  • Database available for read-only access
  • Users must disconnect for logs to be applied
  • Two database licenses required if querying
    standby
  • Manual application failover
  • Supported on standard hardware
  • Possible data loss (unapplied transactions)

22
Database Mirroring
  • Redundancy at user database level
  • Duplicate copy of user database
  • Independent storage devices
  • Multiple copies of instance databases
  • Mirrored over private network channel
  • Mirror always redoing transactions from principal
  • Negligible impact on transaction throughput
  • Multiple mirroring modes
  • High-availability commit _at_ log on mirror
    automatic failover
  • High-protection commit _at_ log on mirror manual
    failover
  • High-performance commit when logged on principal
  • Very fast automatic failoverseconds
  • Requires witness server
  • Mirror-aware application client connection
  • Provided by client library
  • Database connection string must specify both
    servers
  • Mirror may be available for read-only access
    (snapshots)
  • Works with standard hardware

23
Mirror Witness
  • With mirroring, more than one server is required
    to decide on failover
  • Witness automates failover from primary to mirror
  • Watches database availability
  • Reports observations back to principal and mirror
  • Runs in separate SQL Server instance (Express is
    OK)
  • Prevents split brain scenario
  • Very low resource consumption
  • Can be witness for multiple databases
  • Not a single point of failure

24
SQL Server Failover Clustering
  • Two clustered nodes
  • Active/Passive config
  • MS SQL services
  • Running on virtual server
  • Shared storage device
  • User databases
  • System databases
  • Quorum drive
  • Redundant internal components

25
Active/Passive Failover Clustering
  • Redundancy at database instance level
  • All databases fail over together
  • Shared copy of system databases
  • Single data copy on shared storage device
  • No I/O overhead reducing throughput
  • Storage unit is single point of failure for
    cluster
  • All database services are clustered
  • SQL Agent Analysis Services Full-Text engine,
    MS DTC
  • Automatic failover (up to minutes)
  • DBMS accessed over virtual IP
  • Database not available from inactive node for DB
    client connections
  • Storage is controlled by one cluster node at a
    time
  • Requires hardware certified by Microsoft for
    Microsoft Cluster Service

26
HA Comparison
  • Database Mirroring
  • Failover Clustering
  • Scope user DB
  • Standard hardware
  • One SQL license(unless querying snapshots on
    mirror)
  • Very fast failover (seconds)
  • OS flexible (e.g. 32/64)
  • Independent storage
  • Independent services
  • Reporting on mirror
  • Geographic separation OK
  • Scope DBMS instance
  • Certified hardware
  • One SQL license(only one node can access
    database)
  • Automatic failover (up to minutes)
  • Enterprise OS
  • Shared storage
  • Clustered services
  • Standby not available
  • Servers are usually co-located

27
Considerations for HA
  • HA complements backup and recovery strategy
  • Does not replace data recovery plan
  • Application service availability is often
    determined by a network of interdependent
    services
  • Availability can be difficult to define (e.g.
    partial failures)
  • Failure probability difficult to measure or
    compute
  • Increased system complexity could lead to lower
    service availability!
  • Operator error a leading cause of availability
    issues
  • Increased number/types of system components
  • More complex to configure and administer

28
Data Recovery Requirements
29
Disaster Recovery
30
Disaster Recovery
  • Minimize downtime of business operations
  • Redundant systems and facilities
  • SQL Server features
  • Transaction log shipping
  • Database mirroring
  • Failover clustering
  • Other technologies
  • Storage-based mirroring

31
Disaster Recovery Planning
  • Data security requirements
  • Clarify SLA, data loss allowance
  • Evaluate system cost vs. data protection
  • Failure analysis
  • System redundancy
  • Process validation
  • Training for personnel
  • Prevention practices
  • Executing disaster recovery and business
    continuity
  • Practice, practice, practice

32
Business Continuity Facility
  • System redundancy
  • Systems Web servers app servers database, etc.
  • Data Databases data files on OS security info,
    etc.
  • Networking Domain, routing, subnet, VIPs, etc.
  • Alternate facilities
  • Network bandwidth
  • Physical or network access by operations staff
  • Failover
  • Often a deliberate decision, using manual failover

33
Data Redundancy
  • Synchronous redundancy
  • Network bandwidth cost
  • Network latency and application performance
  • Network reliability
  • Asynchronous redundancy
  • Risk of data loss
  • More cost-effective
  • Resilient to network latency issues
  • Candidate Technologies
  • SQL Server database mirroring
  • Failover clustering with SAN-based mirroring

34
DR Using Database Mirroring
  • Two sites Primary and DR location
  • Separate failover clusters at each site
  • SQL Server database mirroring between sites

35
DR Using SAN-Based Mirroring
  • Two sites Primary and DR location
  • Four-node failover cluster one virtual IP
    address
  • SAN-based mirroring between sites
  • Manual cluster failover

36
Complimentary Technologies
  • Skip if time is running short.

37
SAN-Based Data Mirroring
  • Data blocks duplicated at storage level
  • Similar to transaction log shipping
  • Copy performed in sequence and coordinated with
    database checkpoint
  • Ensures consistency of mirrored data files
  • Synchronous or asynchronous mirroring
  • Co-located or geographically dispersedboth are
    OK
  • SAN link bandwidth must support database I/O rate
  • May require extra feature support from SAN vendor
  • Could rely on Failover Clustering for HA

38
SQL Server Database Snapshots
  • Read-only point-in-time database snapshot
  • No data is copiedinstantaneous
  • Historical snapshot pages tracked separately from
    changing pages
  • Snapshots can be maintained indefinitely
  • Limited only by available storage
  • Snapshot copy can be used for reporting
  • Read-only, so no locking issues

39
SQL Server Replication
  • Transactional replication
  • High transaction volume
  • Low data latency required
  • Mixed technologies Integrates with other DBMS
  • Merge replication
  • Bi-directional data changes
  • Typically server-to-client
  • Snapshot replication
  • Large, infrequent data changes
  • Data change latency OK
  • Best for smaller data sets
  • Subscriber databases available for reporting
  • Replicate data subsets
  • Some data loss is possible
  • Periodically validate replicated data

40
App Development and Admin
41
Considerations for App Developers
  • App services tolerant to database service
    interruptions
  • Application transactions must be handled in
    codedata consistency
  • Exception handling for transaction retry,
    connection recovery
  • Requires coding standards, code reviews, and
    testing
  • Bulk data operations
  • Transaction volume impacts rollback time during
    failover
  • Batch jobs must be run on alternate nodes
  • Dont bypass transaction logging
  • Synchronization with external data sources?
  • Be aware of database recovery model
  • Mirroring uses FailoverPartner in connection
    string
  • Use TCP/IP as client protocol

42
Considerations for Admins
  • Use identical server hardware, when possible
  • Design network redundancies, when feasible
  • Consider network latency for geographic
    separation
  • Always manage through virtual cluster, not
    individual cluster nodes
  • Retest failover/failback after HA maintenance
  • Diagnose after failover
  • Repair alternate node
  • Resynchronize data, as necessary
  • Be aware of primary/secondary locations
  • Ensure application services are connected and
    functioning properly
  • Keep server node configurations synchronized
  • Service pack and patch levels
  • Duplicate non-redundant resources
  • Jobs logins and permissions OS sys objects

43
HA Risks
  • System performance degradation
  • HA system complexity leads to availability issues
  • Some system failures not planned for
  • Backup and recovery planning incomplete
  • Administrators not fully trained or informed
  • User databases not synchronized with other data
    sources

44
Common Admin Use Cases
  • Maintain HA nodes
  • Hardware maintenance
  • Rolling upgrades and software patches
  • Resynchronize the redundant copy
  • Re-synch mirror
  • Restart log shipping
  • Diagnose and repair
  • Diagnose cause of failover
  • Repair failed node and restore failover
    capabilities
  • Test failover and failback

45
Common Admin Actions
  • Train and practice administrators to
  • Initiate a database mirror
  • Manually failover mirror database or cluster node
  • Add/remove passive node from mirror or cluster
  • Upgrade/patch servers nodes
  • Restart or redirect application services

46
More Information
47
ReferencesBooks
  • High Availability
  • Related Topics
  • Microsoft SQL Server 2008 High Availability with
    Clustering Database Mirroringby Michael Otey,
    2009.
  • Microsoft SQL Server High Availabilityby Paul
    Bertucci, 2004.
  • Pro SQL Server 2005 High Availabilityby Allan
    Hirt, 2007.
  • Pro SQL Server 2005 Replication by Sujoy Paul,
    2006.
  • Pro SQL Server 2005 Service Brokerby Klaus
    Aschenbrenner, 2007.
  • The Rational Guide to SQL Server 2005 Service
    Brokerby Roger Wolter, 2006.

48
ReferencesPresentations
  • Microsoft Load Balancing and Clusteringhttp//ce.
    sharif.edu/courses/84-85/2/ce317/resources/root/le
    cture20slides/14.20Microsoft20Load20Balancing
    20and20Clustering.ppt
  • SQL Server 2005 High Availabilityhttp//www.atlan
    tamdf.com/Presentations/AtlantaMDF_111207HA.ppt
  • High Availability Technologies In SQL Server 2000
    And SQL Server 2005 http//202.181.238.2/hk/teche
    d2004/ppt/Day_2_Rm407/DAT431(1330-1445).ppt
  • Meeting the Availability Challengehttp//download
    .microsoft.com/download/E/D/C/EDCF54DB-19CD-4882-9
    FC4-4F7D46FCEAA6/HighAvailability.ppt
  • Disaster Recovery Mistakeshttp//www.sqlsig.org/O
    ct201120DASSUG20-20Jason20Hall2010-11-0720M
    M.ppt
  • SQL Server 2005 High Availabilityhttp//blogs.msd
    n.com/sql2005event/attachment/564303.ashx
  • Effective Usage of SQL Server 2005 Database
    Mirroring http//www.sqlserver-qa.net/SSQA-Effect
    ive20Usage20of20SQL20Server20200520Database
    20Mirroring_show.ppt

49
ReferencesArticles
  • Achieve High Availability for SQL
    Serverhttp//technet.microsoft.com/en-us/magazine
    /cc162477.aspx
  • Geographically Dispersed Clusters in Windows
    Server 2003http//www.microsoft.com/windowsserver
    2003/techinfo/overview/clustergeo.mspx
  • Restoring file and filegroup backups
    http//support.microsoft.com/kb/281122/en-us
  • Restoring specific tables or rows from
    backupshttp//support.microsoft.com/kb/321836/en-
    us
  • Maintaining Availability During
    Upgradeshttp//msdn.microsoft.com/en-us/library/m
    s191449.aspx
Write a Comment
User Comments (0)
About PowerShow.com