Advanced Database System - PowerPoint PPT Presentation

1 / 61

About This Presentation

Title:

Advanced Database System

Description:

... the cylinder, and the head of the second disk in the outer half of the cylinders. ... For critical applications, such as banks, airlines, this is not enough. ... – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 62

Provided by: Zhiy2

Category:

more less

Transcript and Presenter's Notes

Title: Advanced Database System

1
Advanced Database System

CS 641
Lecture 4
Jan 24th 2008

2
Accelerating access to secondary storage

Place blocks that are accessed together on the
same cylinder.
Divide the data among several smaller disks
rather than a large one.
Mirror a disk
Use a disk scheduling algorithm
Prefetch blocks to main memory

3
Organizing data by cylinders

Objective reduce seek time
Method analyze application behavior, put data
that is likely to be accessed together on a
single cylinder or adjacent cylinders
Thus, if read all the blocks on a single track or
on a cylinder consecutively, we can neglect all
but the first seek time and the first rotational
latency.

4
Example

Megatron 747
Avg transfer time ,seek time and rotational
latency are 0.25ms, 6.46ms and 4.17ms
respectively.
Sorting for 10,000,000 records need 74mins in
TPMMS.
Each cylinder stores 8M bytes (512 blocks)
We store data on 100,000/512196 cylinders.
We must read 100M/8M13 different cylinders to
fill main memory once.

5
Example (Cont.)

The total time to fill main memory once
6.46ms for one avg seek
12 ms for 12 one-cylinder seeks
1.60s for 6400 blocks (0.25ms per block)
We need to fill main memory 16 times. Thats
about 1.61625.6s.
Both read and write operations take 225.6s51.2s
However, this mechanism can not help phase 2
because we have to read/write one block each
time. Still need 37mins.(?)

6
Using multiple disks

For single disk, only one head can read/write
data at a certain time.
With multiple disks, we can read/write data onto
different disks at the same time.

7
Example

Replace Megatron 747 with 4 Megatron 737.
Divide the given records among 4 disks. Thus the
time to fill 100M main memory is 1.6/40.4s
Entire phase 1 takes 51.2/412.8s
In phase 2, to take advantage of 4 disks, TPMMS
need to be modified
Comparison starts as soon as the first element of
the new block appears in the main memory
The speedup is about in the 2-3 times range.

8
Mirroring disks

Have two or more disks hold identical copies of
data.
It is used for reliability, however, it can also
be use to speed up access to data.
For example, if we have four copies, system can
guarantee to retrieve 4 blocks at once. (read in
parallel)
It can speed up read, but not write.

9
Disk scheduling

Useful to reduce the access latency for many
small processes that each access a few blocks.
Definition In disk I/O, seek time and rotational
latency are very important. Since all disk
requests are linked in queues, reduce them
causing the system to speed up. Disk Scheduling
Algorithms are used to reduce the total seek time
and rotational latency of all requests.

10
Shortest Seek Time First (SSTF)
11
Elevator
12
Example
13
Elevator
14
FCFS
15
Comments

Elevator algorithm can further improve the
throughput when the average number of requests
waiting for the disk increases.

16
Example

Assume megatron 747, 16384 cylinders.
1000 I/O requests
avg 10.254.175.42ms
compare to previous random access (10.88ms)
1000 I/Os finish in 5.42s, avg delay to satisfy
a request is half of it 2.71s
32768 I/O requests
avg 1/20.25(1/2)(2/3)8.333.53ms
32768 I/Os finish in 116s, avg delay to satisfy
a request is 58s.

17
Prefetching (double buffering)

Load the blocks into main memory before they are
needed.
Advantage better schedule disk I/Os.
We can gain speedup in block access.

18
Example

In previous example, if two blocks are given to
each sorted sublist (in phase 2), in case the
records in one block are used up, we can switch
to another one without delay.
However, if those 100,000 blocks are random
accessed, we did not get much benefits. To solve
that
Store the sorted sublists on whole.
Read whole tracks or who cylinder.

19
Another Example

In previous example, each sublist has 2
track-sized buffers, need 16M main memory.
Time to read the whole track 6.468.3314.79ms.
We have 196 cylinders (3136 tracks)
The time to read 313614.79ms46.4s
2 cylinder-sized (16 tracks) buffers, need 256M
main memory
Time to read the whole cylinder
6.46168.33140ms.
The time to read 196140ms27.44s

20
Two different applications

A) regular situation, where blocks can be read
and written in a sequence that can be predicted
in advance. Only one process using the disk
(TPMMS phase 1)
B) a collection of short processes, execute in
parallel, share the same disk, and can not be
predicted in advance. (TPMMS phase 2)

21
Cylinder-based organization

advantage excellent for A)
disadvantage no help for B)

22
Multiple disks

Advantage increasing read/write access rates for
both applications
Problems read/write to the same disk can not be
satisfied at the same time.
Disadvantage cost

23
Mirroring

Advantage increasing read rates for both
applications improve fault tolerance for both
applications
Disadvantage pay for two or more disks but get
the capacity of one.

24
Disk scheduling (Elevator algorithm)

Advantage reduce the average time to read/write
when the accesses to blocks are unpredictable.
Problem it is more efficient for a large amount
of pending I/O requests and the average delay for
the processes are high

25
Prefetching

Advantage speeds up access when the needed
blocks are known but the timing of requests is
data-dependent.
Disadvantage requires extra main memory buffers.
No help when accesses are random.

26
Exercises 11.5.2

Suppose use two Megatron 747 disks as mirrors.
However, instead of allowing reads of any blocks
from either disk, we keep the head of the first
disk in the inner half of the cylinder, and the
head of the second disk in the outer half of the
cylinders. Assuming read requests are on random
tracks
What is the average rate at which this system can
read blocks?
How does this rate compare to no restriction?
What disadvantages do you foresee for this system?

27
Disk failure

Intermittent failure an attempt to read/write a
sector fails, but repeated tries succeed.
Media decay a bit or bits are permanently
corrupted, thus the corresponding sector is
damaged.
Write failure attempt to write a sector is fail
Disk crash the entire disk becomes unreadable,
suddenly or permanently.

28
Techniques

Parity check detect intermittent failures
Stable storage prevent media decay/write failure
damages
RAID coping with disk crashes.
Basic idea using additional storage to keep
redundant information.

29
A useful model

Disk sectors are ordinarily stored with some
redundant bits.
For read a pair(w,s) is returned, where w is the
data in the sector that is read, and s is a
status bit that tells whether or not the read was
successful.

30
Checksum

Additional bits are set depending on the values
of the data bits stored in that sector.
A simple form is based on the parity of all the
bits in the sector.
If an odd number of 1s among the bits, it is odd
parity, or their parity bit is 1.
If an even number of 1s among the bits, it is
even parity, or their parity bit is 0.
Thus, the number of 1s among a collection of
bits and their parity bit is always even.

31
Example

01101000, the parity bit is 1, will be stored as
011010001
11101110, the parity bit is 0, will be stored as
111011100.
If one of these bits fail, it can be detected, if
more than one bits corrupted, 50 the error can
not be detected.
More parity bits can be used to detect more
errors.
In general, if n independent bits are used as a
checksum, the chance of missing an error is only
1/2n

32
Stable storage

Problem checksum can detect error, but can not
fix the error. For critical applications, such as
banks, airlines, this is not enough.
Stable storage pair the sectors, and each pair
representing one sector-contents X.
We call these two copies as XL and XR
If the read returns (w, good) for either XL or
XR, then w is the true value of X.

33
Writing policy

Write the value of X into XL. Check that the
value has status good. Otherwise, repeat write,
if still not succeed for several times, a media
failure is detected, find a spare sector for XL
Repeat for XR

34
Reading policy

To obtain the value of X, read XL. If status
bad, repeat several times. If eventually status
becomes good, take the value for X
If can not read XL, repeat for XR

35
Error-handling capabilities

Media failure
Only both of them fail, then we can not get the
value for X otherwise, we can read X.
Write failure if a system failure (power
outrage) occurs when write X, failure can be
detected
The failure occurred as we were writing XL,
status of XL becomes bad, but XR has the old
value for X
The failure occurred after we wrote XL, XL status
is good, write XR with the value in XL

36
Exercises

11.6.1

37
Recovery from disk crashes

The most common strategy is RAID, Redundant
Arrays of Independent Disks.

38
The failure model for disks

Mean time to failure (MTTF) the length of time
by which 50 of a population of disks will have
failed catastrophically.
For modern disks 10 years.

39
A survival rate curve for disks
40
Mechanism

One or more disks that hold the data (data disks)
and,
adding one or more disks that hold information
that is completely determined by the contents of
the data disks (redundant disks)
In case a disk crash of either a data disk or a
redundant disk, the other disks can be used to
restore the failed disk, so no permanent
information loss.

41
Mirroring (RAID 1)

Mirror each data disk with a redundant disk.
Data loss occurs only when both disks fail
simultaneously.
Example
A disk has 10 probability to fail each year
Replace a new one takes 3 hours,(1/2920 of a
year).
The probability the mirror disk fails is 1/29200.
One of two disks will failure once in 5 years on
average
The mean time to data loss is 1/5(1/29200)1/1460
00 or once per 146,000 years.

42
Parity blocks (RAID 4)

Mirroring approach use as many redundant disks as
data disks.
RAID 4 only use 1 redundant disk no matter how
many data disks there are.
Mechanism assume all the disks are identical, in
the redundant disk, the ith block consists of
parity checks for the ith blocks of all the data
disks.

43
Example

3 data disks 1,2,3. A redundant disk 4.
Disk 1 11110000
Disk 2 10101010
Disk 3 00111000
Then on redundant disk
Disk 4 01100010

44
Reading example

Suppose we are reading a block from disk 1, and
another request comes in to read a different
block of the same disk.
Ordinary, we have to wait for the first request
to finish.
However, we can computer it by reading blocks
from other disks.

45
Reading example (Cont.)

Disk 2 10101010
Disk 3 00111000
Disk 4 01100010
We can get
Disk 1 11110000

46
Writing

When write a new block of a data disk, we need to
change the corresponding block of the redundant
disk as well.
Assume n data disks, a naive approach needs n1
disk I/Os.
A better approach only check old and new version
of the data block to be written.
Read the old value of the block to be changed
Read the corresponding block of the redundant
disk
Write the new data block
Recalculate and write the block of the redundant
disk.

47
Writing example

Disk 1 11110000
Disk 2 10101010
Disk 3 00111000
Disk 4 01100010
Now, change the block in disk 2 to 11001100
Thus, the values in positions 2,3,6 and 7 have
been changed
The block on redundant disk will do the modulo-2
sum of 0110110 to 01100010, and get 00000100.
4 disk I/Os are needed.

48
Failure recovery

If redundant disk fails, replace with a new disk,
and recompute the redundant blocks
If one of the data disk fails, replace with a new
disk, and recompute its data from other disks.
Rule the bit in any position is the modulo-2 sum
of all the bits in the corresponding positions of
all the other disks.

49
Example

If disk 2 fails
Disk 1 11110000
Disk 2 ????????
Disk 3 00111000
Disk 4 01100010
Take the modulo-2 sum of each column
Disk 2 10101010

50
An improvement RAID 5

Problem for RAID 4
if n data disks, the number of disk writes to
the redundant disk will be n times the average
number of writes to any one data disk.
To solve it, RAID 5 treat each disk as the
redundant disk for some of the blocks.

51
Example
52
Example

N4 (4 disks)
average access to each disk is
1/43/41/31/2 of the writes.

53
Coping with multiple disk crashes (RAID 6)

RAID 4 and 5 can not deal with multiple errors.
To deal with two simultaneous crashes, hamming
code (error-correcting code) is used.
In the following discussion
4 data disks (1,2,3,4) and 3 redundant disks
(5,6,7)

54
Redundancy pattern
55
Rules

The bits of disk 5 are the modulo-2 sum of the
corresponding bits of disks 1,2, and 3
The bits of disk 6 are the modulo-2 sum of the
corresponding bits of disks 1,2, and 4
The bits of disk 7 are the modulo-2 sum of the
corresponding bits of disks 1,3, and 4

56
Reading

We can read data from any data disk normally. The
redundant disk can be ignored.

57
Writing

The idea is similar, but now several redundant
disks may be involved.

58
Example

Disk 1 11110000
Disk 2 10101010
Disk 3 00111000
Disk 4 01000001
Disk 5 01100010
Disk 6 00011011
Disk 7 10001001
Rewrite disk 2 to be 00001111, positions 1,3,6,8
changes.
Since disk 5 and 6 calculate the bits involve
disk 2, we must modify them accordingly.
Disk 5 11000111
Disk 6 10111110

59
Failure recovery (2 simultaneous disk crashes)

Disk 1 11110000
Disk 2 ????????
Disk 3 00111000
Disk 4 01000001
Disk 5 ????????
Disk 6 10111110
Disk 7 10001001
Since disk 6 are calculated with disk 1,2,4. we
can recover the data in disk 2 with disk 1,4,6.
Disk 2 00001111
Disk 5 is calculated with disk 1,2,3.
Disk 5 11000111

60
Exercises