Title: Principles of Incident Response and Disaster Recovery
1Principles of Incident Response and Disaster
Recovery
- Chapter 6
- Contingency Strategies for Business Resumption
Planning
2Objectives
- Know and understand the relationships between the
overall use of contingency planning and the
subordinate elements of incident response,
business resumption, disaster recovery, and
business continuity planning - Become familiar with the techniques used for data
and application backup and recovery - Know the strategies employed for resumption of
critical business processes at alternate and
recovered sites
3Introduction
- Contingency planning addresses everything done by
an organization to prepare for the unexpected - IR process focuses on detecting, evaluating, and
reacting to an incident - Later phases focus on keeping the business
functioning even if the physical plant is
destroyed or unavailable - Business resumption (BR) plan takes over when
the IR process cannot contain and resolve an
incident
4Introduction (continued)
- Business resumption (BR) plan major elements
- Disaster recovery (DR) plan lists and describes
the efforts to resume normal operations at the
primary places of business - Business continuity (BC) plan contains steps for
implementing critical business functions using
alternative mechanisms until normal operations
can be resumed at the primary site or elsewhere - Primary site location(s) at which the
organization executes its functions - BR plan operates concurrently with DR plan when
damage is major or long-term
5Introduction (continued)
6Introduction (continued)
7Introduction (continued)
- Each component of CP (IRP, DRP, and BCP) comes
into play at specific times in the life of an
event - 5 key procedural mechanisms for restoring
critical information and facilitating
continuation of operations - Delayed protection
- Real-time protection
- Server recovery
- Application recovery
- Site recovery
8Data and Application Resumption
- Backup methods must be used according to an
established policy - How often to back up
- How long to retain the backups
- What must be backed up
- Data files and critical system files should be
backed up daily, with one copy on-site and one
copy off-site - Nonessential files should be backed up weekly
- Full backups keep at least one copy in a secure
location off-site
9Disk-to-Disk-to-Tape Delayed Protection
- Decreasing costs of storage media, especially
hard drives and removable drives, precludes the
time-consuming nature of tape backup - Storage area networks provide on-line backups
- Lack of redundancy if both online and backup
versions fail or are attacked dictates that tape
backup is still required periodically - Disk-to-disk initial copies are efficient and can
run simultaneously with other processes - Secondary disk-to-tape copies do not affect
production processing
10Disk-to-Disk-to-Tape Delayed Protection
(continued)
- Types of backups
- Full backup
- Differential backup
- Incremental backup
- Full backup
- Includes entire system, including applications,
OS components, and data - Pro provides a comprehensive snapshot
- Con requires large media time consuming
11Disk-to-Disk-to-Tape Delayed Protection
(continued)
- Differential backup
- Includes all files that have changed or been
added since the last full backup - Pro faster and less storage space than full
backup only 1 backup file needed to restore from
full backup - Con gets larger each day and takes longer one
corrupt file loses everything - Incremental backup
- Includes only files that were modified that day
- Pro requires less space and time than the
differential - Con multiple incremental backups are required to
restore from the last full backup
12Disk-to-Disk-to-Tape Delayed Protection
(continued)
- Fastest backup method incremental backups
- Fastest recovery time differential backups
- All on-site and off-site storage must be secured
and must have a controlled environment
(temperature and humidity) - Media should be clearly labeled and
write-protected - Tape media types
- Digital audio tape (DAT)
- Quarter-inch cartridge (QIC)
- 8 mm tape
- Digital linear tape (DLT)
13Disk-to-Disk-to-Tape Delayed Protection
(continued)
- Typical backup scheduling
- Daily on-site incremental or differential backup
- Weekly off-site full backup
- Tape media should be retired and replaced
periodically - Popular strategies for selecting the files to
back up - Six-tape rotation
- Grandfather-Father-Son
- Towers of Hanoi
14Disk-to-Disk-to-Tape Delayed Protection
(continued)
- Six-tape rotation
- Uses a rotation of six sets of media
- Five media sets per week are used with one extra
labeled Friday2 - Friday full backup is taken off-site
- Friday1 and Friday2 are rotated off-site every
week - Provides roughly 2 weeks of recovery capability
- Variation keep a copy of each off-site Friday
tape on-site for faster recovery
15Disk-to-Disk-to-Tape Delayed Protection
(continued)
- Grandfather-Father-Son (GFS)
- Uses five media sets per week
- Allows recovery for previous 3 weeks
- First week uses first set, second week uses
second set, third week uses third set - Following week starts with first set
- Every 2nd or 3rd month, a group of media sets are
taken out of the cycle for permanent storage and
replaced with a new set
16Disk-to-Disk-to-Tape Delayed Protection
(continued)
- Towers of Hanoi
- More complex approach
- Based on statistical principles to optimize media
wear - 16-step strategy assumes that 5 media sets are
used per week on a daily basis - First media set is used more often and must be
monitored for wear
17Disk-to-Disk-to-Tape Delayed Protection
(continued)
18Disk-to-Disk-to-Tape Delayed Protection
(continued)
19Disk-to-Disk-to-Tape Delayed Protection
(continued)
20Disk-to-Disk-to-Tape Delayed Protection
(continued)
21Redundancy-Based Backup and Recovery Using RAID
- Redundant array of independent disks (RAID) uses
online disk drives for redundancy - RAID spreads out data across multiple units, and
offers recovery from hard drive failure - 9 established RAID configurations RAID Level 0
through 10 - RAID Level 0 (disk striping without parity)
- Not redundant
- Spreads data across several drives in segments
called stripes - Failure of one drive may make all data
inaccessible
22Redundancy-Based Backup and Recovery Using RAID
(continued)
- RAID Level 1 (disk mirroring)
- Uses twin drives in a system
- All data written to one drive is written to the
other simultaneously - Is expensive and is an inefficient use of disk
space - Vulnerable to a disk controller failure
- Disk duplexing mirroring with dual disk
controllers - RAID Level 2
- Specialized form of disk striping with parity
that is not widely used - Uses the Hamming code for parity
- No commercial implementations of this
23Redundancy-Based Backup and Recovery Using RAID
(continued)
- RAID Levels 3 and 4
- RAID 3 uses byte-level striping while RAID 4 uses
block-level striping - Parity information is stored on a separate drive
and provides error recovery - RAID Level 5
- Balances safety and redundancy against costs
- Stripes data across multiple drives
- Parity is interleaved with data segments on all
drives - Hot-swappable drives can be replaced without
shutting down the system
24Redundancy-Based Backup and Recovery Using RAID
(continued)
- RAID Level 6
- Combination of RAID 1 and RAID 5
- Performs two different parity computations or the
same computation on overlapping subsets of data - RAID Level 7
- Proprietary variation on RAID 5 in which the
array works as a single virtual drive - May be implemented via software running on RAID 5
hardware - RAID Level 10
- Combination of RAID 1 and RAID 0
25Redundancy-Based Backup and Recovery Using RAID
(continued)
26Database Backups
- Databases require special considerations when
planning backup and recovery procedures - Are special utilities required to perform
database backups? - Can the database be backed up without
interrupting its use? - Are there additional journal files or database
system files that are required in order to use
backup tapes or disk images?
27Application Backups
- Some applications use file systems and databases
in unusual ways - Members of the application development and
support teams should be involved in the planning
process
28Backup and Recovery Plans
- The backup and recovery setting should be
provided with complete recovery plans - Plans need to be developed, tested, and rehearsed
periodically - Plans should include information about
- How and when backups are created and verified
- Who is responsible for backup creation and
verification - Storage and retention of backup media
- Review cycle of the plan
- Rehearsal of the plan
29Real-Time Protection, Server Recovery, and
Application Recovery
- Entire servers can be mirrored to provide
real-time protection and recovery in a strategy
of hot, warm, and cold servers - Hot server the server in production
- Warm server backup server that is running and
may handle overflow work from hot server - Cold server offline, test server
- If hot server goes down, warm and cold servers
are promoted while the hot server is being
repaired - Bare metal recovery technologies designed to
replace operating systems and services when they
fail
30Real-Time Protection, Server Recovery, and
Application Recovery (continued)
- Application recovery (or clustering plus
replication) - Applications are installed on multiple servers
- If one fails, the secondary systems take over the
role - Electronic vaulting
- Bulk transfer of data in batches to an off-site
facility - Receiving server archives the data
- Can be more expensive than tape backup and slower
than data mirroring - Data must be encrypted for transfer over public
infrastructure
31Real-Time Protection, Server Recovery, and
Application Recovery (continued)
32Real-Time Protection, Server Recovery, and
Application Recovery (continued)
- Remote journaling (RJ)
- Transfer of live transactions to an off-site
facility - Only transactions are transferred in near
real-time to a remote location - Facilitates the recovery of key transactions in
near real-time
33Real-Time Protection, Server Recovery, and
Application Recovery (continued)
34Real-Time Protection, Server Recovery, and
Application Recovery (continued)
- Database shadowing (or databank shadowing)
- Storage of duplicate online transaction data and
duplication of databases at a remote site on a
redundant server - Both databases are updated, but only the primary
responds to the user - Combines electronic vaulting with remote
journaling - Used when immediate data recovery is a priority
- Also used for data warehousing, data mining,
batch reporting, complex SQL queries, local
access at the shadow site, and load balancing
35Real-Time Protection, Server Recovery, and
Application Recovery (continued)
36Real-Time Protection, Server Recovery, and
Application Recovery (continued)
- Network-attached storage (NAS)
- Usually a single device or server attached to a
network to provide online storage - Not well suited for real-time applications due to
latency - Storage area networks (SANs)
- Online storage devices connected by fiber-channel
direct connections between the servers and the
additional storage
37Real-Time Protection, Server Recovery, and
Application Recovery (continued)
38Real-Time Protection, Server Recovery, and
Application Recovery (continued)
39Site Resumption Strategies
- If the primary business site is not available,
alternative processing capability may be needed - CPMT can choose from several strategies for
business resumption planning - Exclusive control options
- Hot sites
- Warm sites
- Cold sites
- Shared-use options
- Timeshare
- Service bureaus
- Mutual agreements
40Exclusive Site Resumption Strategies
41Exclusive Site Resumption Strategies (continued)
- Hot site
- Fully configured computer facility
- Duplicates computing resources, peripherals,
phone systems, applications, and workstations - Can be 24/7 if desired
- Can be a mirrored site that is identical to the
primary site
42Exclusive Site Resumption Strategies (continued)
43Exclusive Site Resumption Strategies (continued)
- Warm site
- Provides some of the same services and options as
a hot site - May include computing equipment and peripherals
but not workstations - Has access to data backups or off-site storage
- Lower cost than a hot site, but takes more time
to be fully functional
44Exclusive Site Resumption Strategies (continued)
45Exclusive Site Resumption Strategies (continued)
- Cold site
- Provides only rudimentary services and facilities
- No computer hardware or software are provided
- Communications services must be installed when
the site is occupied - Often no quick recovery or data duplication
functions on site - Primary advantage is cost
46Exclusive Site Resumption Strategies (continued)
47Exclusive Site Resumption Strategies (continued)
- Other options
- Rolling mobile site configured in the payload
area of a tractor-trailer - Rental storage area with duplicate or second
generation equipment - Mobile temporary offices
48Exclusive Site Resumption Strategies (continued)
49Shared Site Resumption Strategies
- Timeshare
- Leased site shared with other organizations
- Possibility that more than one organization might
need the facility simultaneously - Service bureaus
- Service agency that provides physical facilities
in the event of a disaster - May provide off-site data storage
50Shared Site Resumption Strategies (continued)
- Mutual agreement
- Contract between two organizations to provide
mutual assistance in the event of a disaster - Each organization is obligated to provide
facilities, resources, and services to the other - Good for divisions of the same parent company,
between business partners, or when both parties
have similar capabilities and capacities - A memorandum of agreement (MOA) should be drawn
up with specific details
51Service Agreements
- Service agreement
- A contractual document guaranteeing certain
minimum levels of service provided by a vendor - Service agreement should specify
- The parties in the agreement
- Services to be provided by the vendor
- Fees and payments for those services
- Statements of indemnification
- Nondisclosure agreements and intellectual
property assurances - Noncompetitive agreements
52Summary
- Contingency planning includes everything done to
prepare for the unexpected and recover from it - BR plan includes the DR plan for resuming
operations at the primary site and the BC plan
for moving to an alternate site if needed - 5 procedural mechanisms for restoration of
critical data delayed protection, real-time
protection, server recovery, application
recovery, and site recovery - Backup plan is essential
- Retention period for backups must be specified
53Summary (continued)
- 3 types of backups full, differential, and
incremental - RAID systems provide online disk drives for
redundancy - Databases require special considerations for
backup and recovery planning - Mirroring and duplication of server data storage
provide real-time protection - Electronic vaulting, remote journaling, and
database shadowing store data at remote locations
54Summary (continued)
- Business resumption strategies include hot sites,
warm sites, cold sites, timeshare, service
bureaus, and mutual agreements - Service agreements guarantee certain minimum
levels of service by the vendor