Database design with a relational model PowerPoint PPT Presentation

presentation player overlay
1 / 15
About This Presentation
Transcript and Presenter's Notes

Title: Database design with a relational model


1
Database design with a relational model
  • Internal Level Physical Storage and Access

2
Ordered Files
  • One field (attribute) selected for ordering
  • If a key field, data is key-sequenced
  • Allows binary searches for faster retrieval
    (always retrieves mid-page between upper and
    lower limits until correct page is found), since
    log2(B) blocks accessed
  • Inserts and deletes require ordering to be
    maintained (may require writing all pages above
    affected record)
  • Overflow (transaction) file will help to reduce
    this problem
  • Typically only used if a primary index is applied
  • No gain for non-ordered fields
  • Typically requires indexed file access path

3
File storage Ordered with linked lists
  • Assume one record/page (i.e., 5B/Fixed Block)
  • Then
  • Insert Dad
  • Delete Geode
  • Insert Pod

4
File storage Ordered with linked lists
  • Assume one record/page (i.e., 5B/Fixed Block)
  • Then
  • Insert Dad
  • Delete Geode
  • Insert Pod

5
Hash Files
  • Records written in non-sequential order
  • Hash function calculates address of the page
    where record is stored
  • based on a one or more base fields (hash field)
  • If a key field, called hash key
  • Hash function creates even spread of records
    across file
  • Folding applies math to different parts of the
    has field (empID 0110 could become (01)10. 11
    is address of disk page
  • Division-remainder uses mod 0110 mod 100. 10 is
    address of the disk page
  • Can use
  • Open addressing, Unchained overflow, Chained
    overflow, Multiple hashing, Dynamic hashing
  • Limitations
  • Useful for exact match
  • Poor for ranges, patterns, additional fields

6
Hash Files
  • Extendable hashing example
  • A sequence of has function based on a single hash
    function
  • Hashes a unique search-key value into a b-bit
    integer identifying page location
  • For each k, 0 lt k lt b, hk(v) is the integer
    formed by the last k bits of h(v)
  • This makes the range of each function twice its
    predecessor
  • Buckets are created as needed
  • A directory (Bucket Address Table) contains the
    depth (the number of bit positions used)
  • Record can be found in two reads, with the first
    typically in memory
  • No relational logic utilized

Depth 3
Depth 2
Depth 1
7
Indexes
  • Ordered indices values in sorted fashion
  • Hash indices values distributed across
    buckets by using a function
  • An index record consists of a value and pointers
    to one or more records with that value. Can be
  • Dense every value group indexed
  • Sparse only some values are indexed
  • Include
  • Compound indexes (values from more than one data
    column)
  • Covering index (uses values in the index for the
    SELECT clause)
  • Unique index
  • Clustering indexes (stores similar data rows near
    each other)
  • Bitmap indexes (assigns 1 if a value is true, 0
    if false)

8
B-Trees
  • Well established as the most common structures
    for indexes
  • Multi-level
  • d is the order of the tree it is a measure of
    the tree node capacity
  • Every node except the root contains m entries,
    where d/2 lt m lt d
  • The root node contains 1 lt m lt d entries
  • Non-leaf nodes with m index entries contain m1
    pointers to children
  • Pointer Pi points to a subtree with K values such
    that Ki-1 lt K lt Ki

9
B-Trees (order 2) one algorithm
  • Query find all values with a pointer value of P
  • If search value is lt SearchKey value, go left
    otherwise, go right

10
B Tree index/sequence sets
Index Set
Sequence Set
11
B Trees datapage leaf
Sort order Anizy,Apach,Apensen,Ardwick,Arnham,Ath
ens
12
B Tree Example
  • Rules for this example
  • d is the order of the tree it is a measure of
    the capacity of child nodes
  • Every node except the root contains m entries,
    where d/2 lt m lt d
  • The root node contains 1 lt m lt d entries
  • Non-leaf nodes with m index entries contain
    between (m1)/2 and m1 pointers to children
  • Pointer Pi points to a subtree with K values such
    that Ki-1 lt K lt Ki
  • Search uses pointer to the right for greater than
    or equal to in non-leaf nodes, greater than in
    leaf nodes until equal to is found or not found

13
Records in a block
  • How to store records in blocks?
  • number of records r
  • block size B
  • record size R
  • blocking factor bf number of records in a
    block
  • Bf ?B/R? (spanned, unspanned)
  • number of blocks needed b
  • b ?r/bf?

14
B-trees performance impact
  • A 4k page can many records per page
  • ((4 b/pointer 4b/field)n, 4b/pointer) order
    of 512
  • Root 511 records
  • Level 1 261,632 records
  • Level 2 133,955,584 records
  • Total 134,217,727 records
  • Shallow is better

15
B-trees performance impact
  • 1,000,000 records of 300B (including header)
  • Search key is a 4 byte int a pointer requires 4
    bytes
  • 4KB blocks, no block header, random placement,
    avg. retrieval time 5.6 ms
  • No time for memory reads
  • 13.6 records/block 76924 blocks to store data
  • 512 indexes/block 1954 blocks to store index
  • No index
  • (76924/2) 38462 block accesses (avg.)
  • Time to find a record 38462 5.6 ms 215.4 s
  • Indexed, binary search
  • log(1954) 1 11 1 12 block accesses
    (maximum)time to find a record 12 5.6 ms
    67.2 ms
  • Indexing increased speed by 3205 times.
Write a Comment
User Comments (0)
About PowerShow.com