Advanced Database System - PowerPoint PPT Presentation

1 / 61
About This Presentation
Title:

Advanced Database System

Description:

... the cylinder, and the head of the second disk in the outer half of the cylinders. ... For critical applications, such as banks, airlines, this is not enough. ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 62
Provided by: Zhiy2
Category:

less

Transcript and Presenter's Notes

Title: Advanced Database System


1
Advanced Database System
  • CS 641
  • Lecture 4
  • Jan 24th 2008

2
Accelerating access to secondary storage
  • Place blocks that are accessed together on the
    same cylinder.
  • Divide the data among several smaller disks
    rather than a large one.
  • Mirror a disk
  • Use a disk scheduling algorithm
  • Prefetch blocks to main memory

3
Organizing data by cylinders
  • Objective reduce seek time
  • Method analyze application behavior, put data
    that is likely to be accessed together on a
    single cylinder or adjacent cylinders
  • Thus, if read all the blocks on a single track or
    on a cylinder consecutively, we can neglect all
    but the first seek time and the first rotational
    latency.

4
Example
  • Megatron 747
  • Avg transfer time ,seek time and rotational
    latency are 0.25ms, 6.46ms and 4.17ms
    respectively.
  • Sorting for 10,000,000 records need 74mins in
    TPMMS.
  • Each cylinder stores 8M bytes (512 blocks)
  • We store data on 100,000/512196 cylinders.
  • We must read 100M/8M13 different cylinders to
    fill main memory once.

5
Example (Cont.)
  • The total time to fill main memory once
  • 6.46ms for one avg seek
  • 12 ms for 12 one-cylinder seeks
  • 1.60s for 6400 blocks (0.25ms per block)
  • We need to fill main memory 16 times. Thats
    about 1.61625.6s.
  • Both read and write operations take 225.6s51.2s
  • However, this mechanism can not help phase 2
    because we have to read/write one block each
    time. Still need 37mins.(?)

6
Using multiple disks
  • For single disk, only one head can read/write
    data at a certain time.
  • With multiple disks, we can read/write data onto
    different disks at the same time.

7
Example
  • Replace Megatron 747 with 4 Megatron 737.
  • Divide the given records among 4 disks. Thus the
    time to fill 100M main memory is 1.6/40.4s
  • Entire phase 1 takes 51.2/412.8s
  • In phase 2, to take advantage of 4 disks, TPMMS
    need to be modified
  • Comparison starts as soon as the first element of
    the new block appears in the main memory
  • The speedup is about in the 2-3 times range.

8
Mirroring disks
  • Have two or more disks hold identical copies of
    data.
  • It is used for reliability, however, it can also
    be use to speed up access to data.
  • For example, if we have four copies, system can
    guarantee to retrieve 4 blocks at once. (read in
    parallel)
  • It can speed up read, but not write.

9
Disk scheduling
  • Useful to reduce the access latency for many
    small processes that each access a few blocks.
  • Definition In disk I/O, seek time and rotational
    latency are very important. Since all disk
    requests are linked in queues, reduce them
    causing the system to speed up. Disk Scheduling
    Algorithms are used to reduce the total seek time
    and rotational latency of all requests.

10
Shortest Seek Time First (SSTF)
11
Elevator
12
Example
13
Elevator
14
FCFS
15
Comments
  • Elevator algorithm can further improve the
    throughput when the average number of requests
    waiting for the disk increases.

16
Example
  • Assume megatron 747, 16384 cylinders.
  • 1000 I/O requests
  • avg 10.254.175.42ms
  • compare to previous random access (10.88ms)
  • 1000 I/Os finish in 5.42s, avg delay to satisfy
    a request is half of it 2.71s
  • 32768 I/O requests
  • avg 1/20.25(1/2)(2/3)8.333.53ms
  • 32768 I/Os finish in 116s, avg delay to satisfy
    a request is 58s.

17
Prefetching (double buffering)
  • Load the blocks into main memory before they are
    needed.
  • Advantage better schedule disk I/Os.
  • We can gain speedup in block access.

18
Example
  • In previous example, if two blocks are given to
    each sorted sublist (in phase 2), in case the
    records in one block are used up, we can switch
    to another one without delay.
  • However, if those 100,000 blocks are random
    accessed, we did not get much benefits. To solve
    that
  • Store the sorted sublists on whole.
  • Read whole tracks or who cylinder.

19
Another Example
  • In previous example, each sublist has 2
    track-sized buffers, need 16M main memory.
  • Time to read the whole track 6.468.3314.79ms.
  • We have 196 cylinders (3136 tracks)
  • The time to read 313614.79ms46.4s
  • 2 cylinder-sized (16 tracks) buffers, need 256M
    main memory
  • Time to read the whole cylinder
    6.46168.33140ms.
  • The time to read 196140ms27.44s

20
Two different applications
  • A) regular situation, where blocks can be read
    and written in a sequence that can be predicted
    in advance. Only one process using the disk
    (TPMMS phase 1)
  • B) a collection of short processes, execute in
    parallel, share the same disk, and can not be
    predicted in advance. (TPMMS phase 2)

21
Cylinder-based organization
  • advantage excellent for A)
  • disadvantage no help for B)

22
Multiple disks
  • Advantage increasing read/write access rates for
    both applications
  • Problems read/write to the same disk can not be
    satisfied at the same time.
  • Disadvantage cost

23
Mirroring
  • Advantage increasing read rates for both
    applications improve fault tolerance for both
    applications
  • Disadvantage pay for two or more disks but get
    the capacity of one.

24
Disk scheduling (Elevator algorithm)
  • Advantage reduce the average time to read/write
    when the accesses to blocks are unpredictable.
  • Problem it is more efficient for a large amount
    of pending I/O requests and the average delay for
    the processes are high

25
Prefetching
  • Advantage speeds up access when the needed
    blocks are known but the timing of requests is
    data-dependent.
  • Disadvantage requires extra main memory buffers.
    No help when accesses are random.

26
Exercises 11.5.2
  • Suppose use two Megatron 747 disks as mirrors.
    However, instead of allowing reads of any blocks
    from either disk, we keep the head of the first
    disk in the inner half of the cylinder, and the
    head of the second disk in the outer half of the
    cylinders. Assuming read requests are on random
    tracks
  • What is the average rate at which this system can
    read blocks?
  • How does this rate compare to no restriction?
  • What disadvantages do you foresee for this system?

27
Disk failure
  • Intermittent failure an attempt to read/write a
    sector fails, but repeated tries succeed.
  • Media decay a bit or bits are permanently
    corrupted, thus the corresponding sector is
    damaged.
  • Write failure attempt to write a sector is fail
  • Disk crash the entire disk becomes unreadable,
    suddenly or permanently.

28
Techniques
  • Parity check detect intermittent failures
  • Stable storage prevent media decay/write failure
    damages
  • RAID coping with disk crashes.
  • Basic idea using additional storage to keep
    redundant information.

29
A useful model
  • Disk sectors are ordinarily stored with some
    redundant bits.
  • For read a pair(w,s) is returned, where w is the
    data in the sector that is read, and s is a
    status bit that tells whether or not the read was
    successful.

30
Checksum
  • Additional bits are set depending on the values
    of the data bits stored in that sector.
  • A simple form is based on the parity of all the
    bits in the sector.
  • If an odd number of 1s among the bits, it is odd
    parity, or their parity bit is 1.
  • If an even number of 1s among the bits, it is
    even parity, or their parity bit is 0.
  • Thus, the number of 1s among a collection of
    bits and their parity bit is always even.

31
Example
  • 01101000, the parity bit is 1, will be stored as
    011010001
  • 11101110, the parity bit is 0, will be stored as
    111011100.
  • If one of these bits fail, it can be detected, if
    more than one bits corrupted, 50 the error can
    not be detected.
  • More parity bits can be used to detect more
    errors.
  • In general, if n independent bits are used as a
    checksum, the chance of missing an error is only
    1/2n

32
Stable storage
  • Problem checksum can detect error, but can not
    fix the error. For critical applications, such as
    banks, airlines, this is not enough.
  • Stable storage pair the sectors, and each pair
    representing one sector-contents X.
  • We call these two copies as XL and XR
  • If the read returns (w, good) for either XL or
    XR, then w is the true value of X.

33
Writing policy
  • Write the value of X into XL. Check that the
    value has status good. Otherwise, repeat write,
    if still not succeed for several times, a media
    failure is detected, find a spare sector for XL
  • Repeat for XR

34
Reading policy
  • To obtain the value of X, read XL. If status
    bad, repeat several times. If eventually status
    becomes good, take the value for X
  • If can not read XL, repeat for XR

35
Error-handling capabilities
  • Media failure
  • Only both of them fail, then we can not get the
    value for X otherwise, we can read X.
  • Write failure if a system failure (power
    outrage) occurs when write X, failure can be
    detected
  • The failure occurred as we were writing XL,
    status of XL becomes bad, but XR has the old
    value for X
  • The failure occurred after we wrote XL, XL status
    is good, write XR with the value in XL

36
Exercises
  • 11.6.1

37
Recovery from disk crashes
  • The most common strategy is RAID, Redundant
    Arrays of Independent Disks.

38
The failure model for disks
  • Mean time to failure (MTTF) the length of time
    by which 50 of a population of disks will have
    failed catastrophically.
  • For modern disks 10 years.

39
A survival rate curve for disks
40
Mechanism
  • One or more disks that hold the data (data disks)
    and,
  • adding one or more disks that hold information
    that is completely determined by the contents of
    the data disks (redundant disks)
  • In case a disk crash of either a data disk or a
    redundant disk, the other disks can be used to
    restore the failed disk, so no permanent
    information loss.

41
Mirroring (RAID 1)
  • Mirror each data disk with a redundant disk.
  • Data loss occurs only when both disks fail
    simultaneously.
  • Example
  • A disk has 10 probability to fail each year
  • Replace a new one takes 3 hours,(1/2920 of a
    year).
  • The probability the mirror disk fails is 1/29200.
  • One of two disks will failure once in 5 years on
    average
  • The mean time to data loss is 1/5(1/29200)1/1460
    00 or once per 146,000 years.

42
Parity blocks (RAID 4)
  • Mirroring approach use as many redundant disks as
    data disks.
  • RAID 4 only use 1 redundant disk no matter how
    many data disks there are.
  • Mechanism assume all the disks are identical, in
    the redundant disk, the ith block consists of
    parity checks for the ith blocks of all the data
    disks.

43
Example
  • 3 data disks 1,2,3. A redundant disk 4.
  • Disk 1 11110000
  • Disk 2 10101010
  • Disk 3 00111000
  • Then on redundant disk
  • Disk 4 01100010

44
Reading example
  • Suppose we are reading a block from disk 1, and
    another request comes in to read a different
    block of the same disk.
  • Ordinary, we have to wait for the first request
    to finish.
  • However, we can computer it by reading blocks
    from other disks.

45
Reading example (Cont.)
  • Disk 2 10101010
  • Disk 3 00111000
  • Disk 4 01100010
  • We can get
  • Disk 1 11110000

46
Writing
  • When write a new block of a data disk, we need to
    change the corresponding block of the redundant
    disk as well.
  • Assume n data disks, a naive approach needs n1
    disk I/Os.
  • A better approach only check old and new version
    of the data block to be written.
  • Read the old value of the block to be changed
  • Read the corresponding block of the redundant
    disk
  • Write the new data block
  • Recalculate and write the block of the redundant
    disk.

47
Writing example
  • Disk 1 11110000
  • Disk 2 10101010
  • Disk 3 00111000
  • Disk 4 01100010
  • Now, change the block in disk 2 to 11001100
  • Thus, the values in positions 2,3,6 and 7 have
    been changed
  • The block on redundant disk will do the modulo-2
    sum of 0110110 to 01100010, and get 00000100.
  • 4 disk I/Os are needed.

48
Failure recovery
  • If redundant disk fails, replace with a new disk,
    and recompute the redundant blocks
  • If one of the data disk fails, replace with a new
    disk, and recompute its data from other disks.
  • Rule the bit in any position is the modulo-2 sum
    of all the bits in the corresponding positions of
    all the other disks.

49
Example
  • If disk 2 fails
  • Disk 1 11110000
  • Disk 2 ????????
  • Disk 3 00111000
  • Disk 4 01100010
  • Take the modulo-2 sum of each column
  • Disk 2 10101010

50
An improvement RAID 5
  • Problem for RAID 4
  • if n data disks, the number of disk writes to
    the redundant disk will be n times the average
    number of writes to any one data disk.
  • To solve it, RAID 5 treat each disk as the
    redundant disk for some of the blocks.

51
Example
52
Example
  • N4 (4 disks)
  • average access to each disk is
  • 1/43/41/31/2 of the writes.

53
Coping with multiple disk crashes (RAID 6)
  • RAID 4 and 5 can not deal with multiple errors.
  • To deal with two simultaneous crashes, hamming
    code (error-correcting code) is used.
  • In the following discussion
  • 4 data disks (1,2,3,4) and 3 redundant disks
    (5,6,7)

54
Redundancy pattern
55
Rules
  • The bits of disk 5 are the modulo-2 sum of the
    corresponding bits of disks 1,2, and 3
  • The bits of disk 6 are the modulo-2 sum of the
    corresponding bits of disks 1,2, and 4
  • The bits of disk 7 are the modulo-2 sum of the
    corresponding bits of disks 1,3, and 4

56
Reading
  • We can read data from any data disk normally. The
    redundant disk can be ignored.

57
Writing
  • The idea is similar, but now several redundant
    disks may be involved.

58
Example
  • Disk 1 11110000
  • Disk 2 10101010
  • Disk 3 00111000
  • Disk 4 01000001
  • Disk 5 01100010
  • Disk 6 00011011
  • Disk 7 10001001
  • Rewrite disk 2 to be 00001111, positions 1,3,6,8
    changes.
  • Since disk 5 and 6 calculate the bits involve
    disk 2, we must modify them accordingly.
  • Disk 5 11000111
  • Disk 6 10111110

59
Failure recovery (2 simultaneous disk crashes)
  • Disk 1 11110000
  • Disk 2 ????????
  • Disk 3 00111000
  • Disk 4 01000001
  • Disk 5 ????????
  • Disk 6 10111110
  • Disk 7 10001001
  • Since disk 6 are calculated with disk 1,2,4. we
    can recover the data in disk 2 with disk 1,4,6.
  • Disk 2 00001111
  • Disk 5 is calculated with disk 1,2,3.
  • Disk 5 11000111

60
Exercises
  • 11.7.1
  • 11.7.4 a)
  • 11.7.5 a)
  • 11.7.6
  • 11.7.8 a)

61
Summary
  • Memory Hierarchy
  • Tertiary storage
  • Disk/secondary storage
  • Blocks, sectors, cylinders, etc.
  • Disk access time
  • TPMMS
  • Speed up disk access
  • Disk failure model
  • Checksum
  • Stable storage
  • RAID
Write a Comment
User Comments (0)
About PowerShow.com