File Structures - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

File Structures

Description:

File Structures Dale-Marie Wilson, Ph.D. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical storage e.g ... – PowerPoint PPT presentation

Number of Views:142
Avg rating:3.0/5.0
Slides: 28
Provided by: dwil87
Category:

less

Transcript and Presenter's Notes

Title: File Structures


1
File Structures
  • Dale-Marie Wilson, Ph.D.

2
Basic Concepts
  • Primary storage
  • Main memory
  • Inappropriate for storing database
  • Volatile
  • Secondary storage
  • Physical storage e.g. magnetic disks
  • Nonvolatile
  • Cheaper

3
Basic Concepts
  • 2 storage organized into files
  • Each file has one or more records
  • Each record has one or more fields Process
  • User requests tuple e.g. SG37
  • DBMS maps logical record to physical record
  • Physical record moved to DBMS buffers
  • N.B. Physical record is unit of transfer between
    disk and primary storage

4
Basic Concepts
  • Physical record typically consists of more than 1
    logical record
  • Logical record can correspond to more than 1
    physical record
  • Refer to physical record as blocks and pages

staffNo lName position branchNo
SL21 White manager B005
SG37 Beech Assistant B003
SG14 Ford Supervisor B003
SA9 Howe Assistant B007
SG5 Brand Manager B003
SL41 Lee Assistant B005
Page
1
2
5
Basic Concepts
  • File organization
  • Physical arrangement of data in file into records
    and pages in 2 storage
  • Determines order records stored and accessed
  • Types
  • Heap (unordered)
  • Records place on disk in no specific order
  • Sequential (ordered)
  • Records ordered by value of specific field
  • Hash
  • Records placement determined by hash function

6
Basic Concepts
  • Access method
  • Steps involved in storing and retrieving records
    from file

7
Heap Files
  • Unordered files
  • Aka heap files
  • Simplest organization
  • Records placed in same order inserted
  • Linear search for retrieval
  • Insertion efficient retrieval not efficient
  • Deletion process
  • Relevant page identified
  • Record marked as deleted
  • Page rewritten to disk
  • N.B. deleted record space not reused ?
    performance deterioration
  • Best suited for bulk loading data

8
Ordered Files
  • Ordered files
  • Aka sequential files
  • Sorted on field ordering field
  • If ordering field key ? ordering key
  • Binary search for retrieval
  • Insertion and deletion problematic
  • Need to maintain order of records
  • Rarely used unless 1 index exists

9
Hash Files
  • Hash files
  • Aka random/direct files
  • Hash function used to det. page address for
    storing record
  • Chosen to provide most even distribution of
    records min. collisions
  • Examples
  • Folding applying arithmetic function to hash
    field e.g. 7
  • Division-remainder uses mod function to det.
    field value
  • Each address corresponds to a page/bucket
  • Each bucket has slots for multiple records
    placed in order of arrival
  • Base field hash field
  • If hash field key ? hash key
  • Collision
  • Hash function does not calculate unique address
    for 2 or more records

10
Hash Files
  • Collision management techniques
  • Open addressing
  • Unchained overflow
  • Chained overflow
  • Multiple hashing

11
Collision Management
  • Open addressing
  • Linear search performed to locate 1st available
    slot
  • Same procedure for searching for record
  • Record doesnt exist if empty slot found before
    record located

12
Collision Management
  • Unchained overflow
  • Overflow area maintained for collisions
  • Improves over open addressing by minimizing
    collisions

Bucket
Bucket
Staff SA9 record Staff SL21 record
Staff SL41 record
0
3
4
Staff SG37 record
1
Staff SG5 record Staff SG14 record
2
13
Collision Management
  • Chained overflow
  • Overflow area maintained for collisions
  • Uses synonym pointer
  • Additional field that indicates whether collision
    occurred
  • If collision, contains bucket address of overflow
    area

Bucket
Bucket
Staff SA9 record Staff SL21 record
Staff SG7 record
0
3
0
4
Staff SG37 record
1
0
Staff SG5 record Staff SG14 record
2
3
14
Collision Management
  • Multiple hashing
  • If collision occurs, new hash function performed
  • 2nd hash function typically used to place record
    in overflow area

15
Indexes
  • Index
  • Data structure that allows DBMS to locate
    particular records in file more quickly
  • Similar to index in book
  • Main types of indices
  • Primary index
  • Index a key field
  • Clustering index
  • File sequentially ordered on non-key field i.e.
    more than record can correspond with index
  • Secondary index
  • Index defined on non-ordering field of data file

16
Indexes
  • File can have
  • At most 1 primary or 1 clustering index
  • Several secondary indices
  • Index may be
  • Dense
  • Index record for every search key value
  • Sparse
  • Index record for some key search values

17
Indexed Sequential Files
  • Indexed sequential file
  • Sorted data file with primary index
  • Has
  • Primary storage area
  • Separate index
  • Overflow area

18
Multilevel Index
  • Multilevel index
  • Index treated as file and split into smaller
    indices
  • Overcomes problems with large indices that span
    several pages

19
B Trees
  • Search Tree
  • Used to guide search for a record, given the
    value of one of its fields
  • Two types of Nodes
  • Internal Nodes contain Key values and node
    pointers
  • Leaf Nodes contain Key, Record-Pointer pairs
  • Degree/order
  • Max children allowed
  • B-tree balanced tree
  • Depth from root to leaf same for every leaf

20
B Trees
  • The structure of internal nodes in a B tree of
    order p
  • Each internal node is of the form
  • ltP1, K1, P2, K2, ..., Pq-1, Kq-1, Pq gt , where q
    lt p , each Pi  is a tree pointer
  • Within each internal node, K1 lt K2 lt ... lt Kq-1 
  • For all values of X in the subtree pointed at by
    Pi , we have
  • Ki-1 lt X lt Ki  for 1 lt i lt q , X lt Ki for iq,
    and Ki-1 lt X for iq
  • Each internal node has at most p tree pointers
  • Each internal node, except the root, has at least
    (p/2) tree pointers. The root node has at least
    two tree pointers if it is an internal node.
  • An internal node with q pointers, q lt p, has q-1
    search field values.

21
B Trees
22
B Trees
  • The structure of leaf nodes in a B tree of order
    p
  • Each leaf node is of the form
  • lt ltK1,Pr1gt,  ltK2,Pr2gt,  ..., ltKq-1,Prq-1gt,  Pnext
    gt , where q lt p , each Pri  is a data pointer
    that points to a record or block of records
  • Within each internal node, K1 lt K2 lt ... lt Kq-1 
  • Each leaf node, has at least (p/2) values
  • All leaf nodes are at the same level
  • The Pnext pointer points to the next leaf node in
    the tree
  • This give efficient sequential access to data

23
B Trees
24
B Trees
  • Insertion example for B Tree
  • When you insert into a leaf node that is full,
    you split and pass the rightmost value up to the
    parent
  • When you insert into a full root, the root splits
    and a new root is created with the middle value
    from the child nodes
  • Otherwise, values are inserted into openings at
    the lowest level

25
(No Transcript)
26
(No Transcript)
27
  • Appendix F
  • Assignment 7
Write a Comment
User Comments (0)
About PowerShow.com