Indexes%20on%20Sequential%20Files - PowerPoint PPT Presentation

About This Presentation
Title:

Indexes%20on%20Sequential%20Files

Description:

Numeric Example Again. Suppose we put a second-level index on the first-level sparse index ... is itself sequential, so same strategies as for modifying data ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 40
Provided by: JeffU4
Category:

less

Transcript and Presenter's Notes

Title: Indexes%20on%20Sequential%20Files


1
Indexes on Sequential Files
Source our textbook, slides by
Hector Garcia-Molina
2
How to Represent a Relation
  • Suppose we scatter its records arbitrarily among
    the blocks of the disk
  • How to answer SELECT FROM R?
  • Scan every block
  • ridiculously slow
  • would require lots of overhead info in each block
    and each record header

3
How to Represent a Relation
  • Reserve some blocks for the relation
  • No need to scan entire disk
  • How to answer SELECT FROM R WHERE cond ?
  • Scan all the records in the reserved blocks
  • Still ridiculously slow

4
Indexes
  • Use indexes -- special data structures -- that
    allow us to find all the records that satisfy a
    condition "efficiently"
  • Possible data structures
  • simple indexes on sorted files
  • secondary indexes on unsorted files
  • B-trees
  • hash tables

5
Sorted Files
  • Sorted file records (tuples) of the file
    (relation) are in sorted order of the field
    (attribute) of interest.
  • This field might or might not be a key of the
    relation.
  • This field is called the search key.
  • A sorted file is also called a sequential file.

6
Index on Sequential File
  • An index is another file containing key-pointer
    pairs of the form (K,a)
  • K is a search key
  • a is an address (pointer)
  • The record at address a has search key K
  • Particularly useful when the search key is the
    primary key of the relation

7
Dense Indexes
  • An index with one entry for every key in the data
    file
  • What's the point?
  • Index is much smaller than data file when record
    contains much more than just the search key
  • If index is small enough to fit in main memory,
    record with a certain search key can be found
    quickly binary search in memory, followed by
    only one disk I/O

8
Example of a Dense Index
9
Some Numbers
  • relation with 1,000,000 tuples
  • block size is 4096 bytes
  • 10 records per block
  • thus 100,000 blocks, gt 400 Mbytes
  • key field is 30 bytes
  • pointer is 8 bytes
  • thus at least 100 key-pointer pairs per block
  • thus dense index size is 10,000 blocks, about 40
    Mbytes
  • since log(10,000) 13, takes at most 14 disk
    I/O's for a search

10
Sparse Index
  • Uses less space than a dense index
  • Requires more time to find a record with a given
    key
  • In a sparse index, there is just one
    (key,pointer) pair per data block.
  • The key is for the first record in the block.

11
Sparse Index Example
12
Using a Sparse Index
  • To find the record with key K, search the index
    for the largest key K
  • Use binary search to do this
  • Retrieve the indicated data block
  • Search the block for the record with key K

13
Comparing Sparse and Dense Indexes
  • Sparse index uses much less space
  • In the previous numeric example, sparse index
    size is now only 1000 index blocks, about 4
    Mbytes
  • Dense index, unlike sparse, lets us answer "is
    there a record with key K?" without having to
    retrieve a data block

14
Multiple Levels of Index
  • Make an index for the index
  • Can continue this idea for more levels, but
    usually only two levels in practice
  • Second and higher level indexes must be sparse,
    otherwise no savings

15
Two-Level Index Example
16
Numeric Example Again
  • Suppose we put a second-level index on the
    first-level sparse index
  • Since first-level index uses 1000 blocks and 100
    key-pointer pairs fit per block, we need 10
    blocks for second-level index
  • Very likely to keep the second-level index in
    memory
  • Thus search requires at most two disk I/O's (one
    for block of first-level index, one for data
    block)

17
Duplicate Search Keys
  • What if more than one record has a given search
    key value? (Then the search key is not a key of
    the relation.)
  • Solution 1 Use a dense index and allow
    duplicate search keys in it.
  • To find all data records with search key K,
    follow all the pointers in the index with search
    key K

18
Solution 1 Example
19
Duplicate Search Keys with Dense Index
  • Solution 2 only keep record in index for first
    data record with each search key value (saves
    some space in the index)
  • To find all data records with search key K,
    follow the one pointer in the index and then move
    forward in the data file

20
Solution 2 Example
21
Duplicate Search Keys with Sparse Index
  • Recall that index has an entry for just the first
    data record in each block
  • To find all data records with key K
  • find last entry (E1) in index with key K
  • move toward front of index until reaching entry
    (E2) with key lt K
  • Check data blocks pointed to by entries from E2
    to E1 for records with search key K

22
Dupl. Keys w/ Sparse Index
23
Variation on Previous Scheme
  • Index entry for a data block holds smallest
    search key that is new (did not appear in a
    previous block)
  • If there is no new search key in that block, then
    index entry holds the lone search key in the
    block
  • To find all data record with key K
  • search index for first entry whose key is either
    K, or lt K but next key is gt K
  • if a record with key K is in that block then scan
    forward from there

24
Variation Example
25
Inserting and Deleting Data
  • Recall three main techniques
  • create/delete overflow blocks
  • overflow blocks do not have entries in a sparse
    index
  • may be able to insert new blocks in sequential
    order
  • new block needs an entry in a sparse index
  • changing an index can create same problems
  • make room in a full block by sliding some data to
    an adjacent block combine adjacent blocks if
    they get too empty

26
General Strategy
  • When data file changes, index must adapt
  • Details depend on whether index is sparse or
    dense and how data file modifications are
    implemented
  • Index file is itself sequential, so same
    strategies as for modifying data files can be
    applied to index files

27
Effects of Actions on Index
Action Dense Index Sparse Index
Create empty overflow block none none
Delete empty overflow block none none
Create empty (main) block none insert
Delete empty (main) block none delete
Insert record insert maybe update
Delete record delete maybe update
Slide record update maybe update
28
Explanations for Actions
  • create/destroy empty overflow block has no effect
    on
  • dense index since it refers to records
  • sparse index since it refers to main records
  • create/destroy empty main block
  • no effect on dense index as above
  • insert/delete entry in sparse index
  • insert/delete/slide record
  • insert/delete/update entry in dense index
  • only change sparse index if affects first record
    in block

29
Deletion from sparse index
10
30
50
70
90

110
130
150
30
Deletion from sparse index
  • delete record 40

10
30
50
70
90

110
130
150
31
Deletion from sparse index
  • delete record 30

10
30
50
70
90

110
130
150
32
Deletion from sparse index
  • delete records 30 40

10
30
50
70
90

110
130
150
33
Deletion from dense index
10
20
30
40
50

60
70
80
34
Deletion from dense index
  • delete record 30

10
20
30
30
40
40
50

60
70
80
35
Insertion, sparse index case

10
30
40
60
36
Insertion, sparse index case
  • insert record 34


10
30
40
60
37
Insertion, sparse index case
  • insert record 15


10
30
40
60
  • Illustrated Immediate reorganization
  • Variation
  • insert new block (chained file)
  • update index

38
Insertion, sparse index case
  • insert record 25


10
30
40
60
39
Insertion, dense index case
  • Similar
  • Often more expensive . . .
Write a Comment
User Comments (0)
About PowerShow.com