Transactions and Reliability - PowerPoint PPT Presentation

About This Presentation
Title:

Transactions and Reliability

Description:

Transactions and Reliability Andy Wang Operating Systems COP 4610 / CGS 5765 * * * * * * * * * Block-Interleaved Parity Diagram (RAID Level 4) open(foo) read(bar ... – PowerPoint PPT presentation

Number of Views:168
Avg rating:3.0/5.0
Slides: 37
Provided by: csFsuEdu5
Learn more at: http://www.cs.fsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Transactions and Reliability


1
Transactions and Reliability
  • Andy Wang
  • Operating Systems
  • COP 4610 / CGS 5765

2
Motivation
  • File systems have lots of metadata
  • Free blocks, directories, file headers, indirect
    blocks
  • Metadata is heavily cached for performance

3
Problem
  • System crashes
  • OS needs to ensure that the file system does not
    reach an inconsistent state
  • Example move a file between directories
  • Remove a file from the old directory
  • Add a file to the new directory
  • What happens when a crash occurs in the middle?

4
UNIX File System (Ad Hoc Failure-Recovery)
  • Metadata handling
  • Uses a synchronous write-through caching policy
  • A call to update metadata does not return until
    the changes are propagated to disk
  • Updates are ordered
  • When crashes occur, run fsck to repair
    in-progress operations

5
Some Examples of Metadata Handling
  • Undo effects not yet visible to users
  • If a new file is created, but not yet added to
    the directory
  • Delete the file
  • Continue effects that are visible to users
  • If file blocks are already allocated, but not
    recorded in the bitmap
  • Update the bitmap

6
UFS User Data Handling
  • Uses a write-back policy
  • Modified blocks are written to disk at 30-second
    intervals
  • Unless a user issues the sync system call
  • Data updates are not ordered
  • In many cases, consistent metadata is good enough

7
Example Vi
  • Vi saves changes by doing the following
  • 1. Writes the new version in a temp file
  • Now we have old_file and new_temp file
  • 2. Moves the old version to a different temp
    file
  • Now we have new_temp and old_temp
  • 3. Moves the new version into the real file
  • Now we have new_file and old_temp
  • 4. Removes the old version
  • Now we have new_file

8
Example Vi
  • When crashes occur
  • Looks for the leftover files
  • Moves forward or backward depending on the
    integrity of files

9
Transaction Approach
  • A transaction groups operations as a unit, with
    the following characteristics
  • Atomic all operations either happen or they do
    not (no partial operations)
  • Serializable transactions appear to happen one
    after the other
  • Durable once a transaction happens, it is
    recoverable and can survive crashes

10
More on Transactions
  • A transaction is not done until it is committed
  • Once committed, a transaction is durable
  • If a transaction fails to complete, it must
    rollback as if it did not happen at all
  • Critical sections are atomic and serializable,
    but not durable

11
Transaction Implementation (One Thread)
  • Example money transfer
  • Begin transaction
  • x x 1
  • y y 1
  • Commit

12
Transaction Implementation (One Thread)
  • Common implementations involve the use of a log,
    a journal that is never erased
  • A file system uses a write-ahead log to track all
    transactions

13
Transaction Implementation (One Thread)
  • Once accounts of x and y are on a log, the log is
    committed to disk in a single write
  • Actual changes to those accounts are done later

14
Transaction Illustrated
x 1 y 1
x 1 y 1
15
Transaction Illustrated
x 0 y 2
x 1 y 1
16
Transaction Illustrated
x 0 y 2
x 1 y 1
17
Transaction Steps
  • Mark the beginning of the transaction
  • Log the changes in account x
  • Log the changes in account y
  • Commit
  • Modify account x on disk
  • Modify account y on disk

18
Scenarios of Crashes
  • If a crash occurs after the commit
  • Replays the log to update accounts
  • If a crash occurs before the commit
  • Rolls back and discard the transaction
  • A crash cannot occur during the commit
  • Commit is built as an atomic operation
  • e.g. writing a single sector on disk

19
Two-Phase Locking (Multiple Threads)
  • Logging alone not enough to prevent multiple
    transactions from trashing one another (not
    serializable)
  • Solution two-phase locking
  • 1. Acquire all locks
  • 2. Perform updates and release all locks
  • Thread A cannot see thread Bs changes until
    thread A commits and releases locks

20
Transactions in File Systems
  • Almost all file systems built since 1985 use
    write-ahead logging
  • Windows NT, Solaris, OSF, etc
  • Eliminates running fsck after a crash
  • Write-ahead logging provides reliability
  • - All modifications need to be written twice

21
Log-Structured File System (LFS)
  • If logging is so great, why dont we treat
    everything as log entries?
  • Log-structured file system
  • Everything is a log entry (file headers,
    directories, data blocks)
  • Write the log only once
  • Use version stamps to distinguish between old and
    new entries

22
More on LFS
  • New log entries are always appended to the end of
    the existing log
  • All writes are sequential
  • Seeks only occurs during reads
  • Not so bad due to temporal locality and caching
  • Problem
  • Need to create more contiguous space all the time

23
RAID and Reliability
  • So far, we assume that we have a single disk
  • What if we have multiple disks?
  • The chance of a single-disk failure increases
  • RAID redundant array of independent disks
  • Standard way of organizing disks and classifying
    the reliability of multi-disk systems
  • General methods data duplication, parity, and
    error-correcting codes (ECC)

24
RAID 0
  • No redundancy
  • Uses block-level striping across disks
  • i.e., 1st block stored on disk 1, 2nd block
    stored on disk 2
  • Failure causes data loss

25
Non-Redundant Disk Array Diagram (RAID Level 0)

open(foo)
read(bar)
write(zoo)
File System
26
Mirrored Disks (RAID Level 1)
  • Each disk has a second disk that mirrors its
    contents
  • Writes go to both disks
  • Reliability is doubled
  • Read access faster
  • - Write access slower
  • - Expensive and inefficient

27
Mirrored Disk Diagram (RAID Level 1)
open(foo)
read(bar)
write(zoo)
File System
28
Memory-Style ECC (RAID Level 2)
  • Some disks in array are used to hold ECC
  • More efficient than mirroring
  • Can correct, not just detect, errors
  • - Still fairly inefficient
  • e.g., 4 data disks require 3 ECC disks

29
Memory-Style ECC Diagram (RAID Level 2)
open(foo)
read(bar)
write(zoo)
File System
30
Bit-Interleaved Parity (RAID Level 3)
  • Uses bit-level striping across disks
  • i.e., 1st bit stored on disk 1, 2nd bit stored on
    disk 2
  • One disk in the array stores parity for the other
    disks
  • More efficient than Levels 1 and 2
  • - Parity disk doesnt add bandwidth

31
Parity Method
  • Disk 1 1001
  • Disk 2 0101
  • Disk 3 1000
  • Parity 0100 1001 xor 0101 xor 1000
  • To recover disk 2
  • Disk 2 0101 1001 xor 1000 xor 0100

32
Bit-Interleaved RAID Diagram (Level 3)
open(foo)
read(bar)
write(zoo)
File System
33
Block-Interleaved Parity (RAID Level 4)
  • Like bit-interleaved, but data is interleaved in
    blocks
  • More efficient data access than level 3
  • - Parity disk can be a bottleneck
  • - Small writes require 4 I/Os
  • Read the old block
  • Read the old parity
  • Write the new block
  • Write the new parity

34
Block-Interleaved Parity Diagram (RAID Level 4)
open(foo)
read(bar)
write(zoo)
File System
35
Block-Interleaved Distributed-Parity (RAID Level
5)
  • Sort of the most general level of RAID
  • Spreads the parity out over all disks
  • No parity disk bottleneck
  • All disks contribute read bandwidth
  • Requires 4 I/Os for small writes

36
Block-Interleaved Distributed-Parity Diagram
(RAID Level 5)
open(foo)
read(bar)
write(zoo)
File System
Write a Comment
User Comments (0)
About PowerShow.com