Storage Network Designs for OLTP Business Continuity - PowerPoint PPT Presentation

About This Presentation
Title:

Storage Network Designs for OLTP Business Continuity

Description:

Title: Storage Management 2003 Last modified by: TSousa Created Date: 2/5/2002 6:20:53 PM Document presentation format: On-screen Show Company: TechTarget – PowerPoint PPT presentation

Number of Views:176
Avg rating:3.0/5.0
Slides: 65
Provided by: cdnTtgtme3
Category:

less

Transcript and Presenter's Notes

Title: Storage Network Designs for OLTP Business Continuity


1
Storage Network Designs for OLTP Business
Continuity
  • Marc Farley
  • President, Building Storage Networks, Inc.

2
Agenda
  • The Vendor Neutral Approach
  • Overview of OLTP High Availability
  • I/O Redundancy Methods
  • Storage Network Technologies
  • Storage Networking for HA OLTP

3
Vendor Neutral Approach
  • Generic terms, not vendor terms
  • Assumed basic knowledge of SAN, NAS, RAID

4
And now, for something completely different..
5
OLTP Environments
  • Mission critical business applications
  • Business in real-time
  • Expensive equipment and software
  • Aggressive performance objectives
  • Highly skilled IT staff
  • Hands-on computing operations

6
OLTP Database Software
  • Oracle,
  • 8i Oracle Parallel Server (OPS)
  • 9i Real Application Cluster (RAC)
  • IBM
  • DB2 UDB
  • Informix
  • MS SQL Server
  • Sybase, My SQL, others

7
OLTP OS Platforms
  • IBM S/390 MVS
  • Unix Systems
  • Windows 2000
  • HA Linux

8
OLTP Requirements
  • 99.999 uptime
  • Non-degrading response time
  • High transaction rates
  • Seamless scalability
  • Cost relief

9
Database Storage Approaches
  • Raw parititions
  • Bypass OS I/O buffering
  • File system
  • Facilitates data management
  • NFS mounted
  • Offload DB server, NTAP Oracle

10
ACID Properties of OLTP
Atomicity No partial transactions Consistency
All tables are in a consistent state before and
after a completed transaction Isolation One
transaction cannot contaminate other
transactions Durability Transactions are
complete only when the database updates are
written to disk storage
11
Challenges of OLTP
  • Major systems integration effort
  • Intricate tuning and monitoring
  • Little tolerance for errors
  • Complex data structures relationships
  • Time and sequence-sensitive processes
  • Must be adhered to for data integrity
  • Shifting workloads and bottlenecks

12
OLTP Database Files
  • Data files
  • Database data, tablespaces
  • Redo log files, archive log files
  • Reconstruct or rollback transactions
  • Control files
  • File layout information

13
OLTP Table Space Storage
  • Use many spindles to distribute hot spots
  • RAID 01 recommended
  • File system recommended over raw partitions
  • Easier data management

14
Striping for Performance
RAID Controller (Microsecond performance)
DiskDrive
DiskDrive
DiskDrive
DiskDrive
DiskDrive
DiskDrive
Disk Drives (Millesecond performance)From
rotational latency and seek time
15
My Personal Favorite, RAID 01
RAID Controller
DiskDrive
DiskDrive
DiskDrive
DiskDrive
DiskDrive
DiskDrive
DiskDrive
DiskDrive
DiskDrive
DiskDrive
1
2
3
5
4
Mirrored Pairs of Striped Members
16
OLTP Redo Log Storage
  • Raw partitions recommended
  • Sequential high speed writes
  • Separate mirror pairs per log file group
  • Capacity for 30 60 minutes of data
  • Goal is to limit disk contention for current and
    active log files

17
OLTP Archive Log Storage
  • File system or NFS mounting is required
  • NFS mounting is recommended
  • Mirroring or RAID
  • Goal is to have easy access in case they are
    needed for reconstruction

18
High Availability
  • The ability for a system or application to
    immediately continue its mission after loss or
    damage to system components, systems, facilities
    and data

19
Availability Threats
  • Expected
  • Scaling limitations
  • Processor
  • Storage capacity
  • Network
  • Consolidations
  • Product life cycles
  • Unexpected
  • Failures
  • Bugs
  • Virus
  • Operator errors
  • Disasters

20
HA Engages All Elements
  • Systems
  • Application
  • Network connections
  • Network services
  • Storage and I/O subsystems

21
Scoping the Risks
System Network Storage
Component HBA Cable Disk drive
System Server Switch Subsystem
Pathological Virus attack on platform Service provider outage Environmental media loss
Site Server rooms gutted All external communications Total data loss
22
Managing the Risks
  • Local copies of data
  • Immediate availability
  • (Remote) Nearby
  • Immediate availability to several hours
  • Remote Far away
  • One to several days availability

23
Disaster/Availability Radii
Local
Remote Nearby
Remote Far Away
24
Nobody Expects..
  • Weird things to happen to them
  • Disintegration of media
  • Underground flooding through tunnels
  • Fires in Telco switching centers

25
High Availability for OLTP
  • Duplication of functions
  • Without degrading performance
  • Without risking data integrity
  • Brute force techniques
  • Automation and efficiency
  • Cost is always an issue
  • And high availability DOES cost

26
A Long Time Ago in a Job Not So Far Away.
You must learn the to be a master of redundancy
it if you are going to be a storage geek.
Remember Marc, there is only one concept
REDUNDANCY!
Redundancy. Again!
Got it Jim. Lets Eat!
Whatever
27
Eventually, I Learned to Appreciate His
Teachings
  • REDUNDANCYNSPoF (No Single Point of Failure)

Dont get the giant spicy Polish for lunch
its too much for the digestion
28
OLTP HA Requires Complete Redundancy Protection
  • Client network
  • Server systems and components
  • Application modules
  • I/O Channels and Networks
  • Storage subsystems and components
  • Data

29
A Quick Look At Clustered Storage
Shared Everything
Shared Nothing
Both servers share control of a common storage
address space
Each server controls its own storage address space
30
Examples of OLTP Clusters
Microsoft SQL Server
Oracle 9.1 RAC
Data is exchanged between servers
Data is accessed directly from storage
Failoverpaths only
31
One more time, with subsystems
Microsoft SQL Server
Oracle 9.1 RAC
All storage is shared by all cluster nodes
Same subsystem but different address spaces
32
I/O Redundancy
  • Host to subsystem
  • Mirroring Host to independent targets
  • Multi-pathing Host to a single target
  • Subsystem to subsystem
  • Store and forward
  • Local
  • Remote

33
Disk Mirroring Redundant storage targets
Independent, identically sized storage address
spaces
One controller
Two controllers
34
Disk Mirroring I/Os to 2 Targets
  • Brute force redundancy fast and simple
  • Both read and write I/Os
  • Overlapped reads for performance
  • Local connections
  • Limited capacity
  • I/O Bottlenecks for random I/O activity
  • if targets are disk drives

35
Disk Mirroring for Redo Log Files
  • Log files are a common bottleneck
  • Use raw partitions
  • Redundancy is required
  • Mirroring is adequate
  • Use highest RPM with lowest seek times
  • Put on a separate channel from database I/O
  • Use separate mirrored pairs per group

36
Mirroring to Storage Subsystems
StorageSubsystem
Independent, identically sized storage address
spaces
Two controllers
StorageSubsystem
37
Mirroring to Subsystems
  • Targets are subsystems, not disks
  • Separate address spaces
  • Capacity scales to subsystem max
  • Double level redundancy
  • Mirroring plus RAID
  • Multiple disk spindles reduces I/O bottlenecks

38
Disk Mirroring Datafiles from Host to Storage
Subsystems
  • Disk mirroring subsystem RAID
  • Excellent capacity scaling
  • Adjacent and across campus/town
  • One subsystem outside site radius
  • Requires longer distance cabling
  • Reads and writes both transmitted

39
Multi-Pathing Redundant Paths Between a Host
Subsystem
X
Application data volume
Pathing software determines that a transmission
error occurs switches to a redundant path
40
Multi-pathing vs Mirroring
  • Mirroring assumes independent, but similar
    storage targets
  • Multi-pathing assumes multiple paths to the exact
    same target
  • Mirroring can use a single HBA, multi-pathing
    needs two HBAs

41
Path Failures
1
3
2
1. HBA problem
Application data volume
2. Link, switch or network problem
3. Subsystem controller problem
42
Transmission failures recognized after SCSI
timeouts are exceeded
The I/Os is retried and eventually an error is
passed back to the process that issued the I/O
43
Path Failover for OLTP I/O
  • Redundant path resources take over activities for
    a failed path to sustain operations without
    disrupting service or risking data integrity

44
Store and Forward
Independent, identically sized storage address
spaces
Host
B
A
45
Store Forward One Host I/O and Two Copies of
Data
  • Only real option for remote copies
  • Does not forward read I/Os
  • Proprietary protocols and methods
  • Standards are emerging ie. FC/IP
  • First step to storage snapshots

46
Store and Forward Acknowledgements
Asynchronous
Synchronous
B
B
A
A
47
Trade-offs withAcknowledgement Handling
  • Synchronous
  • Always preferred
  • Slowest performance
  • State of copy is precise
  • Asynchronous
  • Fastest performance
  • Least precise knowledge of copy status

48
Store Forward Local and Remote Copies
  • Local nearby copy techniques
  • Synchronous
  • Fiber optic cabling, optical/DWDM services
  • Remote-far away copy techniques
  • Asynchronous
  • ATM gateways, OC-12 or less, FC/IP

49
Mirroring vs Synchronous Store and Forward for
Local Nearby Copies
  • Mirroring
  • Async I/O
  • Reads and writes
  • No snapshot tie-in
  • Uses more host slots
  • Least costly
  • Store and Forward
  • Async or Sync I/O
  • Writes only
  • Snapshot ready
  • May conserve host I/O slots
  • Most costly

50
Combining Mirroring with Store and Forward
Store and Forward Radius
Local
Nearby
Remote Far Away
Mirroring Radius
51
Data Redundancy for OLTP
  • Backup
  • Snapshots
  • Delta (log files)

52
Backup for OLTP
  • A whole subject unto itself
  • Disaster recovery primarily
  • Cold? Who can afford to do that anymore?
  • Hot put DB in backup mode
  • Backup snapshot image of data

53
Subsystem Snapshots for OLTP
1. Flush host buffers (sync, sync)
2. Create Snapshot
Database Server
Disk Storage Subsystem A
Disk Storage Subsystem c
Disk Storage Subsystem B
54
Logical Snapshots for OLTP
1. The address space is mapped
2. First updates
v
Overwritten data locations are not returned to
the free space pool. (Undelete)
3. Secondupdates
55
Delta Redundancy with Log Files
  • Recording of all transaction activities
  • Roll forward, bring up to date
  • Roll Backward, go to known good state
  • Terrific tool for remote redundancy
  • Not HA
  • Process cannot have holes in it

56
Remote Redundancy w/ Log Files
-1
d(x) f(x) f(x-1)
f(x-1)
f(x)
Current to Log File Switch Checkpoint
Latest Redo Log File
Previous Instance
57
And now, some thoughts from our sponsor..
How come I always end up doing all the work?
He never does anything except eat and sleep
Redundancy is a way of life
ManagingRedundancy is Hard Work
58
SAN Considerations
  • Fabrics and SAN Islands
  • Zoning
  • Switches and directors
  • Multiplexing (oversubscribing)
  • Security

59
Fabrics ARE the SAN Environment
  • One size does not fit all applications
  • Larger fabrics carry more risks
  • VSANs are probably a good idea
  • Only use switches supporting hot, stateful
    firmware upgrades

60
SAN Islands May be Best for OLTP
  • Most risk averse approach
  • Dual fabrics, one fabric per I/O path
  • Switch problems do not cascade
  • But, higher management costs

61
Zoning OLTP
  • All ports defined to zones
  • No rogue ports and zombie zones
  • Restrict access to current servers
  • Need-to-access only

62
Switches and Directors
  • Redundancy eats slots and ports
  • Pathing, mirroring
  • Separate channels for data and logs
  • Avoid traversing ISLs, if possible
  • Added latency and blocking potential
  • Trunking must have NSPoF

63
Security
  • Admin security for an OLTP SAN should be as
    strong as possible
  • No monkey business
  • No default passwords left
  • WAN encryption of log files

64
Recommendations
  • Determine OLTP availability needs
  • Where copies should be, time to access
  • Match storage network implementation to DB file
    types
  • Develop availability-driven policies
  • Equipment
  • Processes
Write a Comment
User Comments (0)
About PowerShow.com