Lecture 4: A Case for RAID (Part 2) - PowerPoint PPT Presentation

1 / 36

About This Presentation

Title:

Lecture 4: A Case for RAID (Part 2)

Description:

Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California MTTF, MTBF, MTTR, AFR MTBF: Mean Time Between Failures Designed for ... – PowerPoint PPT presentation

Number of Views:141

Avg rating:3.0/5.0

Slides: 37

Provided by: VishalT8

Category:

more less

Transcript and Presenter's Notes

Title: Lecture 4: A Case for RAID (Part 2)

1
Lecture 4 A Case for RAID (Part 2)

Prof. Shahram Ghandeharizadeh
Computer Science Department
University of Southern California

2
Smaller Inexpensive Disks

25 annual reduction in size 40 annual drop in
price

1 GB, Year 2008 IBM Microdrive _at_ 125
1 GB, Year 1980 IBM 3380 _at_ 40,000
1 inch in height, weighs 1 ounce (16 grams)
Size of a refrigerator, 550 pounds (250 Kg)
3
Inexpensive Disks

Less than 9 Cents / Gigabyte of storage

4
Challenge Managing Data is Expensive

Cost of Managing Data is 100K/TB/Year
High availability Down time is estimated at
thousands of dollars per minute.
Data loss results in lost productivity
20 Megabytes of accounting data requires 21 days
and costs 19K to reproduce.
50 of companies that lose their data due to a
disaster never re-open 90 go out of business in
2 years!

5
Challenge Managing Data is Expensive

Cost of Managing Data is 100K/TB/Year
High availability Down time is estimated at
thousands of dollars per minute.
Data loss results in lost productivity
20 Megabytes of accounting data requires 21 days
and costs 19K to reproduce.
50 of companies that lose their data due to a
disaster never re-open 90 go out of business in
2 years!

RAID
6
MTTF, MTBF, MTTR, AFR

MTBF Mean Time Between Failures
Designed for repairable devices
Number of hours since the system was started
until its failure.
MTTF Mean Time To Failures
Designed for non-repairable devices such as
magnetic disk drives
Disks of 2008 are more than 40 times more
reliable than disks of 1988.
MTTR Mean Time To Repair
Number of hours required to replace a disk drive,
AND
Reconstruct the data stored on the failed disk
drive.
AFR Annualized Failure Rate
Computed by assuming a temperature for the case
(40 degrees centigrade), power-on-hours per year
(say 8,760, 24x7), and 250 average motor
start/stop cycles per year.

7
Focus on MTTF MTTR

MTTF Mean Time To Failures
Designed for non-repairable devices such as
magnetic disk drives
Disks of 2008 are more than 40 times more
reliable than disks of 1988.
MTTR Mean Time To Repair
Number of hours required to replace a disk drive,
AND
Reconstruct the data stored on the failed disk
drive.

8
Assumptions

MTTF of a disk is independent of other disks in a
RAID.
Assume
The MTTF of a disk is once every 100 years, and
An array of 1000 such disks.
The MTTF of any single disk in the array is once
every 37 days.

9
RAID

RAID organizes D disks into nG groups where each
group consists of G disks and C parity disks.
Example
D 8
G 4
C 1
nG 8/4 2

Disk 1
Disk 2
Disk 3
Disk 4
Parity 1
Disk 5
Disk 6
Parity 2
Disk 7
Disk 8
Parity Group 1
Parity Group 2
10
RAID

RAID organizes D disks into nG groups where each
group consists of G disks and C parity disks.
Example
D 8
G 4
C 1
nG 8/4 2

Disk 1
Disk 2
Disk 3
Disk 4
Parity 1
Disk 5
Disk 6
Parity 2
Disk 7
Disk 8
Parity Group 1
Parity Group 2
11
RAID With 1 Group

With G disks in a group and C check disks, a
failure is encountered when
A disk in the group fails, AND
A second disk fails before the failed disk of
step 1 is repaired.
MTTF of a group of disks with RAID is

12
RAID With 1 Group (Cont)

Probability of another failure
MTTR includes the time required to
Replace the failed disk drive,
Reconstruct the content of the failed disk.
Performing step 2 in a lazy manner increases
duration of MTTR.
And the probability of another failure.
What happens if we increase the number of data
disks in a group?

13
RAID with nG Groups

With nG groups, the Mean Time To Failure of the
RAID is computed in a similar manner

14
Review

RAID 1 and 3 were presented in the previous
lecture.
Here is a quick review.

15
RAID 1 Disk Mirroring

Contents of disks 1 and 2 are identical.
Redundant paths keep data available in the
presence of either a controller or disk failure.
A write operation by a CPU is directed to both
disks.
A read operation is directed to one of the disks.
Each disk might be reading different sectors
simultaneously.

Tandems architecture

CPU 1
Controller 1
Controller 2
Disk 1
Disk 2
16
RAID 3 Small Blocks Reads

Bit-interleaved.
Bad news Small reads of less than the group
size, requires reading the whole group.
E.g., read of one sector, requires read of 4
sectors.
One parity group has the read rate identical to
one disk.

01011110101010000001101001111
Disk 1
Disk 2
Disk 3
Disk 4
Parity
0 1
0 1
1 1
0 1
1 0
17
RAID 3 Small Block Reads

Given a large number of disks, say D12, enhance
performance by constructing several parity
groups, say 3.
With G (4) disks per group and D (say 8), the
number of read requests supported by RAID 3 when
compared with one disks is the number of groups
(2). Number of groups is D/G.

Disk 1
Disk 2
Disk 3
Disk 4
Parity 1
Disk 5
Disk 6
Parity 2
Disk 7
Disk 8
Parity Group 1
Parity Group 2
18
Any Questions?
19
A Few Questions?

Assume one instance of RAID-1 organization. What
are the values for
D
G
C
nG

20
A Few Questions?

Assume one instance of RAID-1 organization. What
are the values for
D1
G1
C1
nG1

21
A Few Questions?

Assume one instance of RAID-1 organization. What
are the values for
D1
G1
C1
nG1
Is the availability characteristics of the
following Level 3 RAID better than RAID 1?

Disk 1
Disk 2
Disk 3
Disk 4
Parity 1
Parity Group
22
RAID 4

Enhances performance of small reads/writes/read-mo
dify-write. How?
Interleave data across disks at the granularity
of a transfer unit. Minimum size is a sector.
Parity block ECC1 is an exclusive or of the bits
in blocks a, b, c, and d.

Disk 1
Disk 2
Disk 3
Disk 4
Parity
Block a
Block b
Block c
Block d
ECC 1
23
RAID 4

Small read retrieves its block from one disk.
Now, 4 requests referencing blocks on different
data disks may proceed in parallel.
When compared with 1 disk, throughput of a D disk
system is D times higher.

Disk 1
Disk 2
Disk 3
Disk 4
Parity
Block a
Block b
Block c
Block d
ECC 1
24
RAID 4 Failures (Cont)

If Disk 2 fails, a small read for Block b
retrieves blocks a, c, d, and ECC 1 from disks 1,
3, 4, and Parity disks to compute the missing
block. What is throughput relative to one disk
now?
Once Disk 2 is replaced with a new one, its
content is constructed either eagerly or in a
lazy manner. System cannot be too lazy because
we want to minimize MTTR.

Disk 1
Disk 2
Disk 3
Disk 4
Parity
Block a
Block b
Block c
Block d
ECC 1
25
RAID 4 Failures (Cont)

If the Parity disk fails, read of data blocks may
proceed as in normal mode of operation.
Once the Parity disk is replaced, content of new
Parity disk is constructed either eagerly or
lazily.

Disk 1
Disk 2
Disk 3
Disk 4
Parity
Block a
Block b
Block c
Block d
ECC 1
26
RAID 4 Small Writes

Performance of small writes is improved.
To write Block b
Read the old Block b and old parity block ECC1,
Compute the new parity using the old Block b, new
Block b, and the old parity
New parity (old block xor new block) xor old
parity ECC1
A write requires 4 accesses 2 reads and 2
writes.

Disk 1
Disk 2
Disk 3
Disk 4
Parity
Block a
Block b
Block c
Block d
ECC 1
27
RAID 4 Bottlenecks

For writes, parity disk is a bottleneck.
Two different writes to Block b and g must read
ECC1 and ECC2 from the Parity disk. A queue will
form on the Parity disk.
Performance of small writes is same as RAID 3,
D/2G.

Disk 1
Disk 2
Disk 3
Disk 4
Parity
Block a
Block b
Block c
Block d
ECC 1
Block e
Block f
Block g
Block h
ECC 2
28
RAID 4 Summary
29
RAID 5 Resolve the Bottleneck

Distribute data and check blocks across all disks.

Disk 1
Disk 2
Disk 3
Disk 4
Disk 5
Block a
Block b
Block c
Block d
ECC 1
Block h
Block e
Block f
Block g
ECC 2
Block i
Block j
ECC 3
Block k
Block l
Block p
Block m
ECC 4
Block n
Block o
Block t
ECC 5
Block q
Block r
Block s
30
RAID 5 Resolve the Bottleneck

Write of Blocks a and j may proceed in parallel
now.

Check disks service read requests.
With D disks broken into nG groups, number of
parity disks is nGC. nG D/G.
When compared with one disk, the throughput of a
D disk system is D CD/G times higher.

Disk 1
Disk 2
Disk 3
Disk 4
Disk 5
Block a
Block b
Block c
Block d
ECC 1
Block h
Block e
Block f
Block g
ECC 2
Block i
Block j
ECC 3
Block k
Block l
32
RAID 5 Write Performance

For writes, read the referenced block and its
parity block. Compute the new parity block.
Write the new data block and its parity block.
Continue to use the parity disk.
With D disks broken into nG groups, number of
parity disks is nGC. nG D/G.
When compared with one disk, the throughput of a
D disk system is D/4 (CD/G)/4 times higher.

Disk 1
Disk 2
Disk 3
Disk 4
Disk 5
Block a
Block b
Block c
Block d
ECC 1
Block h
Block e
Block f
Block g
ECC 2
Block i
Block j
ECC 3
Block k
Block l
33
RAID 5 R-M-W Performance

For R-M-W, read and write of the data block comes
for free.
the referenced block is already retrieved. Must
perform one extra disk I/O to read they parity
block. Compute the new parity block. Write the
new data block and its parity block.
Continue to use the parity disk.
With D disks broken into nG groups, number of
parity disks is nGC. nG D/G.
When compared with one disk, the throughput of a
D disk system is D/2 (CD/G)/2 times higher.

Disk 1
Disk 2
Disk 3
Disk 4
Disk 5
Block a
Block b
Block c
Block d
ECC 1
Block h
Block e
Block f
Block g
ECC 2
Block i
Block j
ECC 3
Block k
Block l
34
RAID 5 Summary
35
RAID 5 Summary