File Organizations and Indexes - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

File Organizations and Indexes

Description:

Rpm: rounds per minute. 2400, 3600, 7200 rpm. Ex. 2400 rpm, then each round takes 1/2400 min/round. 60*1000/2400 ... Half round. Block transfer time. Example ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 48
Provided by: cob3
Learn more at: https://faculty.sfsu.edu
Category:

less

Transcript and Presenter's Notes

Title: File Organizations and Indexes


1
File Organizations and Indexes
  • ISYS 464

2
Disk Devices
  • Disk drive Read/write head and access arm.
  • Single-sided, double-sided, disk pack
  • Track, sector, cylinder (tracks with the same
    diameter on the various disks)
  • Page, block, or physical record It is the unit
    of transfer between disk and primary storage, and
    vice versa.
  • Blocking factor the number of records in a block

3
Disk Speed
  • Rpm rounds per minute
  • 2400, 3600, 7200 rpm
  • Ex. 2400 rpm, then each round takes 1/2400
    min/round.
  • 601000/2400 25 msec/r

4
Time Required to Read One Block
  • Seek time
  • Rotational delay
  • Half round
  • Block transfer time

5
Example
  • A student file contains 20,000 records, each
    record has 113 bytes, assume each block is 512
    bytes, how many blocks needed?
  • Blocking factor floor(Block size/record size)
    floor(512/113)4
  • Number of blocks ceiling(number of
    records/blocking factor) 20,000/45,000 blocks

6
Linear Search, Binary search, and Direct Access
  • Assume seek s, rotational delay r, block
    transfer time tr, and file size is 5000 blocks,
  • then the average time to do a linear search is
  • s r tr(half of blocks) s r 2500tr
  • If the file is ordered by a key field, then the
    time to do a binary search is
  • (s r tr) Log25000
  • If index is available to enable direct access
  • s r tr

7
Linear Search and Binary search
  • Assume seek s, rotational delay r, block
    transfer time tr, and file size is 5000 blocks,
    then the average time to do a linear search is
  • s r tr(half of blocks) s r 2500tr
  • Binary search If the file is ordered by a key
    field, then the time to do a binary search is
  • . Number of blocks accessed given n blocks
    Log2n
  • . (s r tr) Log25000

8
Updating a Record
  • Read the block into main memory.
  • Change the record in main memory.
  • Write the block back to disk.

9
File Organizations
  • Technique for physically arranging records of a
    file on secondary storage
  • Factors for selecting file organization
  • Fast data retrieval and throughput
  • Efficient storage space utilization
  • Protection from failure and data loss
  • Minimizing need for reorganization
  • Accommodating growth
  • Security from unauthorized use
  • Types of file organizations
  • Sequential
  • Indexed
  • Hashed

10
Access Method
  • The steps involved in storing and retrieving
    records from a file.
  • Searching and updating

11
Unordered Files (Heap Files)
  • Records are placed in the file in the same order
    as they are inserted.
  • Searching must do a linear search if index is
    not available.
  • Updating
  • Insertion Read the last page, append to the last
    page, then write the page back.
  • Modification Search and read the block to main
    memory. Write the block back after making
    changes.
  • Deletion Mark the record for deletion (deletion
    flag) and periodically reorganize the file.

12
Ordered Files
  • Enable binary search
  • Insertion May need a temporary overflow file and
    periodically the overflow file is merged with the
    ordered file.
  • Deletion May need periodical reorganization.

13
Hash Files (Direct Files)
  • The page a record is to be stored is determined
    by a hash function.
  • Hash function calculates the address of the page
    based on the key field of the file
  • Address H(Key)
  • Typical hash function division/remainder
  • 0 lt Key Mod M lt M-1
  • Where M is the number of blocks allocated to this
    file.

14
Disk blocks
0
1
Block Address
2
0
1
2
3
3
4
5
4
6
7
5
6
H(K) -gt Block number Block address Physical
address
7
15
Hash File Example
  • 8 blocks, each block holds 2 records
  • Hash function Key Mod 8
  • Record keys
  • Key 1821, Key Mod 8 5
  • 7115, 3
  • 2428, 4
  • 4750, 6
  • 1620, 4
  • 4692, 4

16
Collision Resolution
  • Collision When a records home block is full.
  • Open addressing (linear probing) Place the
    record in the first available block.

17
Searching a Hash File
  • Home block H(SearchKey)
  • If found in the home block then search successful
  • Else
  • Search the next block until found or reach a
    block with empty space

18
Hash File Performance
  • Average Search Length (Total of blocks
    accessed to find all records)/(The number of
    records in the file)
  • Using the previous example
  • (1 1 1 2 1 1)/6 7/6
  • Time needed to find a record in this file
  • (s r tr) 7/6

19
Factors Affecting Hash File Performance
  • Hash file should spread the records evenly over
    the disk space.
  • Use of a low load factor
  • ( of records)/( of available spaces)
  • Allow each block to hold more records

20
Limitations of Hash File
  • Cannot be accessed by other order
  • Direct access only
  • Fixed amount of space allocated to the file
  • Static hashing
  • Waste space, hard to grow
  • Inappropriate for retrievals based on ranges of
    values
  • Find EmpID 123
  • Find EmpID gt 123

21
Index
  • A data structure that allows the DBMS to locate
    particular records in a file more quickly.
  • Index file
  • IndexField RecordPointer
  • Ordered according to the indexing field

22
Types of Index
  • Primary index Index on the primary key field.
  • Secondary index Index on a non-key field.

23
Index on Ordering Key Field
SID
S05,
SID
S07,
Block ptr
S10,
S05
S12
S12,
S25
S15,
S20,
S25,
S27,
S30,
Note The number of index entries equals the
number of file blocks.
24
Index on NonOrdering Key Field
SID
S25,
S47,
Record ptr
S12,
S05
S12
S22,
S20
S05,
S22
S20,
S33,
S27,
S30,
Note The number of index entries equals the
number of records.
25
Index on Ordering NonKey Field(Cluster Index)
SID
Major
S25,
ACCT
Major
S47,
Block ptr
ACCT
ACCT
S12,
ACCT
CIS
S22,
ACCT
FIN
S05,
CIS
S20,
CIS
CIS
S33,
FIN
S27,
FIN
S30,
26
Index on NonOrdering NonKey Field
SID
Major
S25,
CIS
FIN
Major
S47,
Record ptr
ACCT
S12,
ACCT
ACCT
S22,
ACCT
CIS
S05,
CIS
CIS
S20,
FIN
CIS
FIN
MKT
S33,
CIS
S27,
FIN
S30,
27
Physical pointer vs Logical Pointer
When index on the key field is available, index
on nonkey field can use record keys as logical
pointers.
SID
Major
S25,
CIS
FIN
S47,
SID
Major
ACCT
S12,
ACCT
S12
S22
ACCT
S22,
ACCT
CIS
S25
S05,
CIS
CIS
S05
S20,
FIN
S27
CIS
FIN
S47
MKT
S33,
CIS
S27,
FIN
S30,
28
Physical pointer vs Logical Pointer
When index on the key field is available, index
on nonkey field can use record keys as logical
pointers.
SID
Major
SID is a logical pointer
S25,
CIS
FIN
S47,
SID
Major
ACCT
S12,
ACCT
S12
S22
ACCT
S22,
ACCT
CIS
S25
S05,
CIS
CIS
S05
S20,
FIN
S27
CIS
FIN
S47
MKT
S33,
CIS
S27,
FIN
S30,
The location of S12 can be found by search the
primary index.
29
Searching with Index
A file with 30,000 records, each record has 100
bytes, block size is 1024 bytes . Data file
blocking factor floor(1024/100)10 . Data file
blocks ceiling(30,000/10)3000 blocks If key
field has 9 bytes, and physical pointer has 6
bytes, so each index entry has 15 bytes . Index
file blocking factor floor(1024/15) 68 .
Index file blocks ceiling(30,000/68) 442
blocks Time to search for a record with the
index is . Binary search the index Log2442 .
One data file access . Time (s rd tr) (1
Log2442)
30
Tree
  • Nodes
  • Regular nodes (internal nodes) nodes with parent
    and children
  • Root node node with no parent
  • Leaf nodes nodes with no children
  • Level length of the path from the root to a
    node.
  • Root level 0
  • Balanced tree All leaf nodes are at the same
    level.

31
B -Trees
  • If a node can store n pointers (n-1 keys), then
    each node except root and leaf nodes has at least
    ceiling(n/2) pointers.
  • Each key in the tree represents (key
    RecordPointer)
  • All leaf nodes are at the same level.
  • When a node split, it splits into two nodes at
    the same level, and the middle key is moved up to
    its parent node.

32
B-Tree Examples
  • A B-Tree with 3 pointers (2 keys) in a node,
    insert keys 8, 5, 1,7, 3, 12, 9, 6, 4
  • A B-Tree with 4 pointers (3 keys) in a node,
    insert keys 23, 65, 37, 60, 46, 92, 48, 71, 56,
    59, 100, 95

33
B Trees
  • Record pointers are stored only at the leaf
    nodes.
  • More keys in a node, shorter path
  • Every key must exist at the leaf nodes.
  • Every leaf node contains pointer to the next leaf
    node.
  • Node Split
  • Leaf node split keep the middle key in the left
    node and duplicate it in the parent node.
  • Internal node split move up the middle key as
    B-Tree.

34
B Tree Examples
  • A B Tree with 3 pointers (2 keys) in a node,
    insert keys 8, 5, 1, 7, 3, 12, 9, 6
  • A B Tree with 4 pointers (3 keys) in a node,
    insert keys 23, 65, 37, 60, 46, 92, 48, 71, 56,
    59, 100, 95

35
B Tree Advantages
  • Shorter tree Because internal nodes do not
    include record pointers, internal nodes can have
    more keys.
  • All keys in the leaf nodes are already in sorted
    order.
  • B Tree can be used to store data file.

36
  • Bitmap saves on space requirements
  • Rows - possible values of the attribute
  • Columns - table rows
  • Bit indicates whether the attribute of a row has
    the values

Figure 6-8 Bitmap index index organization
The bitmap index is used where the values of a
field repeats very frequently, it is not used for
primary key index.
37
  • Too many indexes will slow down update operations.

38
(No Transcript)
39
Rules for Using Indexes
  1. Use on larger tables
  2. Index the primary key of each table
  3. Index search fields (fields frequently in WHERE
    clause)
  4. Fields in SQL ORDER BY and GROUP BY commands
  5. When there are gt100 values but not when there are
    lt30 values

40
Rules for Using Indexes (cont.)
  • Avoid use of indexes for fields with long values
    perhaps compress values first
  • DBMS may have limit on number of indexes per
    table and number of bytes per indexed field(s)
  • Null values will not be referenced from an index
  • Use indexes heavily for non-volatile databases
    limit the use of indexes for volatile databases
  • Why? Because modifications (e.g. inserts,
    deletes) require updates to occur in index files

41
Redundant Arrays of Inexpensive (Independent)
Disks
  • RAID is a method to group more than one drive and
    make them appear as a single drive.

42
RAID 0
  • Creating a stripe set without parity
  • Spreads the data out over various disks

Disk 0
Disk 1
Disk 2
Disk 3
1A
2A
3A
4A
1B
2B
3B
4B
1C
2C
3C
4C
No redundancy Best write performance disk can be
accessed in parallel Unreliable
43
Figure 6-10 RAID with four disks and striping
Here, pages 1-4 can be read/written simultaneously
44
RAID 1
  • Mirror set
  • Primary disk and mirror disk
  • 2 writes
  • Data can be accessed from either disk.
  • Fault tolerance

45
RAID 5
  • Creating a stripe set with parity

Disk 0
Disk 1
Disk 2
Disk 3
ParityA
1A
2A
3A
1B
Parity B
2B
3B
1C 1D
2C
Parity C
3C
2C
3D
Parity D
46
Exclusive OR, XOR
  • Condition 1

Condtion 1 XOR Condition 2
Condition 1
Condition 2
T
T
F
T
F
T
F
T
T
F
F
F
47
Creating Parity with XOR
1A1010, 2A0100, 3A1100 ParityA(1A XOR 2A) XOR
3A 0010
Disk 0
Disk 1
Disk 2
Disk 3
ParityA
1A
2A
3A
If Disk 0 fails Recover by using (1A XOR 2A)
XOR 3A If Disk 1 fails Recover by using
(ParityA XOR 2A) XOR 3A
Write a Comment
User Comments (0)
About PowerShow.com