CS 728 Advanced Database Systems Chapter 16 - PowerPoint PPT Presentation


PPT – CS 728 Advanced Database Systems Chapter 16 PowerPoint presentation | free to download - id: 71d24d-ZjhkZ


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

CS 728 Advanced Database Systems Chapter 16


CS 728 Advanced Database Systems Chapter 16 Database File Organization: Unordered, Ordered, and Hashed Files of Records Disk I/O Model of Computation Disk I/O is ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 53
Provided by: eduj1
Learn more at: http://www.just.edu.jo


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: CS 728 Advanced Database Systems Chapter 16

CS 728 Advanced Database Systems Chapter 16
  • Database File Organization
  • Unordered, Ordered, and Hashed Files of Records

Disk I/O Model of Computation
  • Disk I/O is equivalent to one read or write
    operation of a single block
  • It is very expensive compared with what is likely
    to be done once the block gets in main memory
  • one random disk I/O about 1,000,000 machine
    instructions in terms of time
  • Cost for computation that requires secondary
    storage is computed only by disk I/Os.

Disk Storage Devices
  • Preferred secondary storage device for high
    storage capacity and low cost.
  • Data stored as magnetized areas on magnetic disk
  • A disk pack contains several magnetic disks
    connected to a rotating spindle.
  • Disks are divided into concentric circular tracks
    on each disk surface.
  • Track capacities vary typically from 4 to 50
    Kbytes or more

Disk Storage Devices (cont.)
  • A track is divided into smaller blocks or sectors
  • because it usually contains a large amount of
  • The division of a track into sectors is
    hard-coded on the disk surface and cannot be
  • One type of sector organization calls a portion
    of a track that subtends a fixed angle at the
    center as a sector.
  • A track is divided into blocks.
  • The block size B is fixed for each system.
  • Typical block sizes range from B512 bytes to
    B4096 bytes.
  • Whole blocks are transferred between disk and
    main memory for processing.

Disk Storage Devices (cont.)
Disk Storage Devices (cont.)
  • A read-write head moves to the track that
    contains the block to be transferred.
  • Disk rotation moves the block under the
    read-write head for reading or writing.
  • A physical disk block (hardware) address consists
  • a cylinder number (imaginary collection of tracks
    of same radius from all recorded surfaces)
  • the track number or surface number (within the
  • and block number (within track).
  • Reading or writing a disk block is time consuming
    because of the seek time s and rotational delay
    (latency) rd.
  • Double buffering can be used to speed up the
    transfer of contiguous disk blocks.

Disk Storage Devices (cont.)
Typical Disk Parameters
Pages and Blocks
  • Data files decomposed into pages (blocks)
  • fixed size piece of contiguous information in the
  • sizes range from 512 bytes to several kilobytes
  • block is the smallest unit for transferring data
    between the main memory and the disk.
  • Address of a page (block)
  • (cylinder, track (within cylinder), sector
    (within track)

Pages and Blocks
Page I/O
  • Page I/O --- one page I/O is the cost (or time
    needed) to transfer one page of data between the
    memory and the disk.
  • The cost of a (random) page I/O
  • seek time rotational delay block transfer
  • Seek time
  • time needed to position read/write head on
    correct track.
  • Rotational delay (latency)
  • time needed to rotate the beginning of page under
    read/write head.
  • Block transfer time
  • time needed to transfer data in the page/block.

Page I/O
  • Average rotational delay (rd)
  • rd ½ (1/p) min (601000)/(2p) msec
  • OR
  • rd ½ cost of 1 revolution
  • ½ (601000/p) msec
  • where
  • p is speed of disk rotation (how many revolutions
    per minute - rpm)
  • Example
  • Speed of disk rotatioon is p 3600 rpm
  • 60 revolutions/sec
  • 1 rev. 16.66 msec. (1 second 1000 msec)
  • rd 8.33 ms

Page I/O
  • Transfer rate (tr)
  • tr track size / cost of one revolution
  • track size / (601000/p) in msec
  • Bulk transfer rate (btr)
  • btr (B/(BG)) tr bytes/msec
  • Where B is the block size in bytes
  • G is interblock gap size in bytes
  • Block transfer time (btt)
  • btt B / tr not taking into acount G
  • btt B / btr taking into acount G

Page I/O
  • Example
  • Track size 50 KB and p 3600 rpm
  • Block size B 3KB 3000 bytes
  • tr (501000)/(601000/3600) 3000 bytes/msec
  • btt B / tr 3000/3000 1 msec

Page I/O
  • Average time for reading/writing n consecutive
    pages that are in the same track or cylinder s
    rd n btt
  • Average time for reading/writing consecutively n
    noncontigues pages/blocks that are in the same
    cylinder s n (rd btt)

An Example
  • A hard disk specifications
  • 4 platters, 8 Surfaces, 3.5 Inch diameter
  • 213 8192 tracks/surface
  • 28 256 sectors/track
  • 29 512 bytes/sector
  • Average seek time s 25 ms
  • Rotation rate rd 3600 rpm 60 rps
  • 1 rev. 16.66 msec
  • Transfer rate
  • tr 1 KB in 0.117 ms
  • tr 1 KB in 0.130 ms with gap

An Example
  • What is the total capacity of this disk
  • 8 GB (82132829233)
  • How many bytes does one track hold?
  • 256 sectors/track512 bytes/sector 128KB
  • How many blocks per track?
  • one block 4096 bytes 8 sectors (4096/512)
  • 256/8 32 blocks/track

An Example
  • How long does it take to access one block?
  • One block 4096 bytes
  • 8 sectors 4096/512
  • Rotation rate r
  • 1 rev. 16.66 msec.
  • Time to access 1 sector (s r/2
  • 25 (16.66/2) .117/2 33.3885 ms.
  • time to access 1 block
  • time to access the first sector of the block
    time to access the subsequent 7 sectors.

An Example
  • T 25 (16.66/2) (0.117/2) 1 (0.13/2) 7
  • 33.3885 0.455 ms 33.8435ms
  • Compare to one sector access time 33.3885 ms

  • A buffer
  • is a contiguous reserved area in main memory
    available for storage of copies of disk blocks.
  • to speed up the processes.
  • For a read command
  • the block from disk is copied into the buffer.
  • For a write command
  • the contents of the buffer are copied into the

Accessing Data Through RAM Buffer
File Organization
  • The database is stored as a collection of files.
  • Each file is a sequence of records.
  • A record is a sequence of fields.
  • Records are stored on disk blocks.
  • A file can have fixed-length records or
    variable-length records.

File Organization
  • Fixed length records
  • Each record is of fixed length. Pad with spaces.
  • Variable length records
  • different records in the file have different
  • Arise in database systems in several ways
  • different record types in a file.
  • same record type with (variable-length fields,
    repeating field, or optional fields)

File Organization
Fixed-Length Records
  • Insertion
  • Store record i starting from byte n ? (i 1),
    where n is the size of each record.
  • Deletion of record i
  • Packed format
  • move records i 1, . . ., n to i, . . . , n 1
  • OR
  • move record n to i
  • Unpacked format (do not move records, but)
  • link all free records on a free list
  • OR
  • Use bitmap vector

Free Lists
  • Store the address of the first deleted record in
    the file header.
  • Use this first record to store the address of the
    second deleted record, and so on.

Page Formats Fixed Length Records
  • Record id ltpage id, slot gt.

Variable-Length Records Representation
  • Byte-String representation
  • Attach an end-of-record (?) control character to
    the end of each record
  • Difficulty with deletion and growth
  • Slotted-page header contains
  • number of record entries
  • location and size of each record
  • end of free space in the block

Slotted Page Structure
  • Records can be moved around within a page to keep
    them contiguous with no empty space between them
  • entry in the header must be updated.
  • Pointers should not point directly to record -
    instead they should point to the entry for the
    record in header.

Fixed-Length Representation
  • Reserved Space
  • can use fixed-length records of a known maximum
  • unused space in shorter records filled with a
    null or end-of-record symbol.

Fixed-Length Representation
  • List Representation by Pointers
  • A variable-length record is represented by a list
    of fixed-length records, chained together via
  • Can be used even if the maximum record length is
    not known

Fixed-Length Representation
  • Disadvantage space is wasted in all records
    except the first in a a chain.
  • Solution is to allow two kinds of block in file
  • Anchor block contains the first records of chain
  • Overflow block contains records other than those
    that are the first records of chairs.

Blocking Factor
  • Blocking Factor (bfr) - the number of records
    that can fit into a single block.
  • bfr ?B/R?
  • B Block size in bytes
  • R Record size in bytes
  • Example
  • Record size R 100 bytes
  • Block Size B 2,000 bytes
  • Thus the blocking factor bfr 2000/100 20
  • The number of blocks b needed to store a file of
    r records
  • b ?r/bfr? blocks

Spanned Unspanned Records
  • A block is the unit of data transfer between disk
    and memory.
  • Unspanned records
  • A record is found in one and only one block.
  • records do not span across block boundaries.
  • Used with fixed-length records having B ? R
  • Spanned records
  • Records are allowed to span across block
  • Used with variable-length records having R ? B
  • In variable-length records, either organization
    can be used.

Placing File Records on Disk
  • A file header or file descriptor contains
    information about a file (e.g., the disk address,
    record format descriptions, etc.)

Allocating File Blocks on Disk
  • The physical disk blocks that are allocated to
    hold the records of a file can be contiguous,
    linked, or indexed.
  • In contiguous allocation, the file blocks are
    allocated to consecutive disk blocks.
  • In linked allocation, each file block contains a
    pointer to the next file block.
  • In indexed allocation, one or more index blocks
    contain pointers to the actual file blocks.

Organization of Records in Files
  • Heap/Unordered/Pile File Organization
  • a record can be placed anywhere in the file where
    there is space, or at the end
  • for full file scans or frequent updates
  • Data unordered (unsorted)
  • Sorted/Ordered File Organization
  • store records sorted in order, based on the value
    of the search key of each record
  • Need external sort or an index to keep sorted
  • Hashing File Organization
  • a hash function computed on some attribute of
    each record
  • the result specifies in which block of the file
    the record should be placed

Heap File Organization
  • Records are placed in the file in the order in
    which they are inserted. Such an organization is
    called a heap file.
  • Insertion is at the end
  • takes constant time O(1) (very efficient)
  • Searching
  • requires a linear search (expensive)
  • Deleting
  • requires a search, then delete
  • Select, Update and Delete
  • take b/2 time (linear time) in average
  • b is the number of blocks

Heap File Organization
  • For a file of unordered fixed-length records
    using unspanned blocks and contiguous allocation,
    it is straightforward to access any record by its
    position in the file.
  • If the records are numbered 0,1,2, , r-1 and
  • The records in each block are numbered 0,1,2, ,
    f-1, where f is the blocking factor
  • The the i-th record of the file is located in
  • Block ?i/f? and in the
  • (i mod f)-th record in that block

Heap File Organization
  • A Heap file allows us to retrieve records
  • by specifying the rid, or
  • by scanning all records sequentially
  • Accessing a record by its position does not help
    locate a record based on a search condition.

File Stored as a Heap File
666666 MGT123 F1994 4.0 123456
CS305 S1996 4.0 page 0 987654
CS305 F1995 2.0 717171 CS315
S1997 4.0 666666 EE101 S1998
3.0 page 1 765432 MAT123 S1996
2.0 515151 EE101 F1995
3.0 234567 CS305 S1999 4.0

page 2 878787 MGT123 S1996
Sequential File Organization
  • Suitable for applications that require sequential
    processing of the entire file
  • The records in the file are ordered by a

Files of Ordered Records
  • Some blocks of an ordered (sequential) file of
    EMPLOYEE records with NAME as the ordering key

File Stored as a Sorted File
111111 MGT123 F1994 4.0 111111
CS305 S1996 4.0 page 0 123456
CS305 F1995 2.0 123456 CS315
S1997 4.0 123456 EE101 S1998
3.0 page 1 232323 MAT123 S1996
2.0 234567 EE101 F1995
3.0 234567 CS305 S1999 4.0

page 2 313131 MGT123 S1996
Sequential File Organization
  • Insertion is expensive
  • records must be inserted in the correct order
  • locate the position where the record is to be
  • if there is free space insert there
  • if no free space insert the record in an overflow
  • In either case, pointer chain must be updated
  • Insert takes lg(b) plus the time to re-organize
  • b is the number of blocks
  • Deletion
  • use pointer chains
  • Searching
  • very efficient (Binary search)
  • This requires lg(b) on the average

Sequential File Organization
Average Access Times
  • The following table shows the average access time
    to access a specific record for a given type of

Hashed Files
  • Hashing for disk files is called External Hashing
  • The file blocks are divided into M equal-sized
    buckets, numbered bucket0, bucket1, ...,
  • Typically, a bucket corresponds to one (or a
    fixed number of) disk block.
  • One of the file fields is designated to be the
    hash key of the file.
  • The record with hash key value K is stored in
    bucket i, where ih(K), and h is the hashing
  • Search is very efficient on the hash key.
  • Collisions occur when a new record hashes to a
    bucket that is already full.
  • An overflow file is kept for storing such
  • Overflow records that hash to each bucket can be
    linked together.

Hashed Files (cont.)
  • There are numerous methods for collision
    resolution, including the following
  • Open addressing Proceeding from the occupied
    position specified by the hash address, the
    program checks the subsequent positions in order
    until an unused (empty) position is found.
  • Chaining For this method, various overflow
    locations are kept, usually by extending the
    array with a number of overflow positions. In
    addition, a pointer field is added to each record
    location. A collision is resolved by placing the
    new record in an unused overflow location and
    setting the pointer of the occupied hash address
    location to the address of that overflow
  • Multiple hashing The program applies a second
    hash function if the first results in a
    collision. If another collision results, the
    program uses open addressing or applies a third
    hash function and then uses open addressing if

Hashed Files (cont.)
Hashed Files (cont.)
  • To reduce overflow records, a hash file is
    typically kept 70-80 full.
  • The hash function h should distribute the records
    uniformly among the buckets
  • Otherwise, search time will be increased because
    many overflow records will exist.
  • Main disadvantages of static external hashing
  • Fixed number of buckets M is a problem if the
    number of records in the file grows or shrinks.
  • Ordered access on the hash key is quite
    inefficient (requires sorting the records).

Hashed Files - Overflow Handling
About PowerShow.com