CS 245: Database System Principles - PowerPoint PPT Presentation

1 / 71
About This Presentation
Title:

CS 245: Database System Principles

Description:

CS 245: Database System Principles Notes 5: Hashing and More Hector Garcia-Molina CS 245 Notes 5 * – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 72
Provided by: Sir127
Category:

less

Transcript and Presenter's Notes

Title: CS 245: Database System Principles


1
CS 245 Database System Principles
Notes 5 Hashing and More
  • Hector Garcia-Molina

2
Hashing
  • key ? h(key)

ltkeygt
Buckets (typically 1 disk block)
. . .
3
  • Two alternatives

. . .
records
(1) key ? h(key)
. . .
4
Two alternatives
record
(2) key ? h(key)
key 1
Index
5
Two alternatives
record
(2) key ? h(key)
key 1
Index
  • Alt (2) for secondary search key

6
Example hash function
  • Key x1 x2 xn n byte character string
  • Have b buckets
  • h add x1 x2 .. xn
  • compute sum modulo b

7
  • ? This may not be best function
  • ? Read Knuth Vol. 3 if you really need to
    select a good function.

8
  • ? This may not be best function
  • ? Read Knuth Vol. 3 if you really need to
    select a good function.

Good hash ? Expected number of
function keys/bucket is the same for all
buckets
9
Within a bucket
  • Do we keep keys sorted?
  • Yes, if CPU time critical
  • Inserts/Deletes not too frequent

10
Next example to illustrate inserts,
overflows, deletes
  • h(K)

11
EXAMPLE 2 records/bucket
0 1 2 3
  • INSERT
  • h(a) 1
  • h(b) 2
  • h(c) 1
  • h(d) 0

12
EXAMPLE 2 records/bucket
0 1 2 3
  • INSERT
  • h(a) 1
  • h(b) 2
  • h(c) 1
  • h(d) 0

h(e) 1
13
EXAMPLE 2 records/bucket
0 1 2 3
  • INSERT
  • h(a) 1
  • h(b) 2
  • h(c) 1
  • h(d) 0

h(e) 1
14
EXAMPLE deletion
Deleteef
0 1 2 3
a
b
d
c
e
f
g
15
EXAMPLE deletion
Deleteef
0 1 2 3
a
b
d
c
c
e
f
g
16
EXAMPLE deletion
Deleteef
0 1 2 3
a
b
d
c
c
e
f
g
17
Rule of thumb
  • Try to keep space utilization
  • between 50 and 80
  • Utilization keys used
  • total keys that fit

18
Rule of thumb
  • Try to keep space utilization
  • between 50 and 80
  • Utilization keys used
  • total keys that fit

19
How do we cope with growth?
  • Overflows and reorganizations
  • Dynamic hashing

20
How do we cope with growth?
  • Overflows and reorganizations
  • Dynamic hashing

21
Extensible hashing two ideas
  • (a) Use i of b bits output by hash function
  • b
  • h(K) ?
  • use i ? grows over time.

00110101
22
  • (b) Use directory
  • h(K)i to bucket

. . .
. . .
23
Example h(k) is 4 bits 2 keys/bucket
1
  • i

0001
1
1001
1100
Insert 1010
24
Example h(k) is 4 bits 2 keys/bucket
1
  • i

0001
1
1001
1100
Insert 1010
25
Example h(k) is 4 bits 2 keys/bucket
1
  • i

0001
1
1001
1100
Insert 1010
26
Example continued
i
2
00 01 10 11
1
0001
1001
1010
Insert 0111 0000
1100
27
Example continued
i
2
00 01 10 11
1
0001
0111
1001
1010
Insert 0111 0000
1100
28
Example continued
i
2
00 01 10 11
1
0001
0111
1001
1010
Insert 0111 0000
1100
29
Example continued
2
0000
0001
i
2
00 01 10 11
2
0111
Insert 1001
30
Example continued
2
0000
0001
i
2
00 01 10 11
2
0111
Insert 1001
31
Example continued
2
0000
0001
i
2
00 01 10 11
2
0111
Insert 1001
32
Extensible hashing deletion
  • No merging of blocks
  • Merge blocks and cut directory if possible
  • (Reverse insert procedure)

33
Deletion example
  • Run thru insert example in reverse!

34
Note Still need overflow chains
  • Example many records with duplicate keys

if we split
insert 1100
1101
1100
1100
1100
35
Solution overflow chains
add overflow block
insert 1100
1100
1101
1101
1101
1100
36
Extensible hashing
Summary
  • Can handle growing files
  • - with less wasted space
  • - with no full reorganizations


37
Extensible hashing
Summary
  • Can handle growing files
  • - with less wasted space
  • - with no full reorganizations


38
Linear hashing
  • Another dynamic hashing scheme

39
Linear hashing
  • Another dynamic hashing scheme

40
Example b4 bits, i 2, 2 keys/bucket
Future growth buckets
0101
0000
1111
1010
  • 00 01 10 11

m 01 (max used block)
41
Example b4 bits, i 2, 2 keys/bucket
Future growth buckets
0101
0000
1111
1010
  • 00 01 10 11

m 01 (max used block)
42
Example b4 bits, i 2, 2 keys/bucket
  • insert 0101

Future growth buckets
0101
0000
1111
1010
  • 00 01 10 11

m 01 (max used block)
43
Example b4 bits, i 2, 2 keys/bucket
  • insert 0101

Future growth buckets
0101
0000
1111
1010
  • 00 01 10 11

m 01 (max used block)
44
Note
  • In textbook, n is used instead of m
  • nm1

n10
Future growth buckets
0101
0000
1111
1010
00 01 10 11
m 01 (max used block)
45
Example b4 bits, i 2, 2 keys/bucket
Future growth buckets
0101
0000
1111
1010
  • 00 01 10 11

m 01 (max used block)
46
Example b4 bits, i 2, 2 keys/bucket
Future growth buckets
0101
0000
1111
1010
  • 00 01 10 11

m 01 (max used block)
47
Example b4 bits, i 2, 2 keys/bucket
Future growth buckets
0101
0000
1111
1010
  • 00 01 10 11

m 01 (max used block)
48
Example b4 bits, i 2, 2 keys/bucket
Future growth buckets
0101
0000
1111
1010
  • 00 01 10 11

m 01 (max used block)
49
Example b4 bits, i 2, 2 keys/bucket
Future growth buckets
0101
0000
1111
1010
  • 00 01 10 11

m 01 (max used block)
50
Example Continued How to grow beyond this?
i 2
1111
1010
0101
0000
0101
  • 00 01 10 11

. . .
m 11 (max used block)
51
Example Continued How to grow beyond this?
i 2
1111
1010
0101
0000
0101
  • 00 01 10 11

. . .
m 11 (max used block)
52
Example Continued How to grow beyond this?
i 2
1111
1010
0101
0000
0101
  • 00 01 10 11

. . .
m 11 (max used block)
53
Example Continued How to grow beyond this?
i 2
1111
1010
0101
0000
0101
  • 00 01 10 11

. . .
m 11 (max used block)
54
? When do we expand file?
  • Keep track of used slots
  • total of slots

U
55
? When do we expand file?
  • Keep track of used slots
  • total of slots

U
  • If U gt threshold then increase m
  • (and maybe i )

56
Linear Hashing
Summary
  • Can handle growing files
  • - with less wasted space
  • - with no full reorganizations
  • No indirection like extensible hashing



57
Example BAD CASE
  • Very full
  • Very empty Need to move
  • m here
  • Would waste
  • space...

58
Summary
  • Hashing
  • - How it works
  • - Dynamic hashing
  • - Extensible
  • - Linear

59
Next
  • Indexing vs Hashing
  • Index definition in SQL
  • Multiple key access

60
Indexing vs Hashing
  • Hashing good for probes given key
  • e.g., SELECT
  • FROM R
  • WHERE R.A 5

61
Indexing vs Hashing
  • INDEXING (Including B Trees) good for
  • Range Searches
  • e.g., SELECT
  • FROM R
  • WHERE R.A gt 5

62
Index definition in SQL
  • Create index name on rel (attr)
  • Create unique index name on rel (attr)

defines candidate key
  • Drop INDEX name

63
  • CANNOT SPECIFY TYPE OF INDEX
  • (e.g. B-tree, Hashing, )
  • OR PARAMETERS
  • (e.g. Load Factor, Size of Hash,...)
  • ... at least in SQL...

Note
64
  • ATTRIBUTE LIST ? MULTIKEY INDEX
  • (next)
  • e.g., CREATE INDEX foo ON R(A,B,C)

Note
65

Multi-key Index
  • Motivation Find records where
  • DEPT Toy AND SAL gt 50k

66
Strategy I
  • Use one index, say Dept.
  • Get all Dept Toy records and
    check their salary

I1
67
Strategy II
  • Use 2 Indexes Manipulate Pointers
  • Toy Sal
  • gt 50k

68
Strategy III
  • Multiple Key Index
  • One idea

I2
I3
I1
69
Example
10k
15k
  • Example
  • Record
  • Dept
  • Index
  • Salary
  • Index

17k
21k
NameJoe DEPTSales SAL15k
12k
15k
15k
19k
70
For which queries is this index good?
  • Find RECs Dept Sales SAL20k
  • Find RECs Dept Sales SAL gt 20k
  • Find RECs Dept Sales
  • Find RECs SAL 20k

71
The BIG picture.
  • Chapters 11 12 13 Storage, records,
    blocks...
  • Chapters 13 14 14 Access Mechanisms -
    Indexes
  • - B trees
  • - Hashing
  • - Multi key
  • Chapters 15 16 15, 16 Query Processing

NEXT
Write a Comment
User Comments (0)
About PowerShow.com