External Memory Hashing PowerPoint PPT Presentation

presentation player overlay
1 / 20
About This Presentation
Transcript and Presenter's Notes

Title: External Memory Hashing


1
External Memory Hashing
2
Hash Tables
  • Hash function h search key ? 0B-1.
  • Buckets are blocks, numbered 0B-1.
  • Big idea If a record with search key K exists,
    then it must be in bucket h(K).
  • One disk I/O if there is only one block per
    bucket.

HashTable Lookup For record(s) with search key
K, compute h(K) search that bucket.
3
HashTable Insertion
  • Put in bucket h(K) if it fits otherwise create
    an overflow block.
  • Overflow block(s) are part of bucket. Example
    Insert record with search key g.

4
What if the File Grows too Large?
  • Efficiency is highest if
  • records lt buckets ? (records/block)
  • If file grows, we need a dynamic hashing method
    to maintain the above relationship.
  • Extensible Hashing double the number of buckets
    when needed.
  • Linear hashing add one more bucket as
    appropriate.

5
Dynamic Hashing Framework
  • Hash function h produces a sequence of k bits.
  • Only some of the bits are used at any time to
    determine placement of keys in buckets.
  • Extensible Hashing (Buckets may share blocks!)
  • Keep parameter i number of bits from the
    beginning of h(K) that determine the bucket.
  • Bucket array now pointers to buckets.
  • A block can serve several buckets.
  • For each block, a parameter j?i tells how many
    bits of h(K) determine membership in the block.
  • i.e., a block represents 2i-j buckets that share
    the first j bits of their number.

6
Example
  • An extensible hash table when i1

7
Extensible Hashtable Insert
  • If record with key K fits in the block B pointed
    to by h(K), put it there.
  • If not, let this block B represent j bits.
  • ji
  • Set ii1
  • Double the bucket array, so it has now 2i1
    entries
  • Let w be an old array entry. Both the new
    entries, w0 and w1, point to the same block that
    w used to point to.
  • Split B into two and distribute the records (of
    B) according to (j1)st bit
  • set jj1
  • fix pointers in bucket array, so that entries
    that formerly pointed to B now point either to B
    or the new block
  • How?
  • depending on(j1)st bit
  • jlti
  • Do as in 1.d

8
Example
  • Insert record with h(K) 1010.

9
Example Next
  • Next records with
  • h(K)0000 h(K)0111.
  • Bucket for 0... gets split,
  • but i stays at 2.
  • Then record with h(K) 1000.
  • Overflows bucket for 10...
  • Raise i to 3.

10
Extensible Hash Tables
  • Advantages
  • Lookup never search more than one data block.
  • Hope that the bucket array fits in main memory
  • Defects
  • Doubling the bucket array could make the array to
    not fit in main memory.
  • Problem with skewed key distributions.
  • E.g. Let 1 block2 records. Suppose that three
    records have hash values, which happen to be the
    same in the first 20 bits.
  • In that case we would have i20 and and one
    million bucket-array entries, even though we have
    only 3 records!!

11
Linear Hashing
  • Use i bits from right (loworder) end of h(K).
  • Buckets numbered 0n-1, where 2i-1ltn?2i.
  • Let last i bits of h(K) be m a1a2ai
  • If m lt n, then record belongs to bucket m.
  • If n?mlt2i, then record belongs to bucket m-2i-1,
    that is the bucket we would get if we changed a1
    (which must be 1) to 0.

of buckets
of records
This is also part of the structure
12
Linear HashTable Insert
  • Pick an upper limit on capacity,
  • e.g., 85 (1.7 records/bucket in our example).
  • If an insertion exceeds capacity limit, set n
    n 1.
  • If new n is 2i 1, set i i 1.
  • No change in bucket numbers needed --- just
    imagine a leading 0.
  • Need to split bucket n - 2i-1 because there is
    now a bucket numbered (old) n.

13
Example
  • Insert records with h(K) 0000, 1010, 1111,
    0101, 0001, 1100.

Before
After
14
Example
  • Insert records with h(K) 0000, 1010, 1111,
    0101, 0001, 1100.

Before
After
Capacity limit exceeded increment n
15
Example
  • Insert records with h(K) 0000, 1010, 1111,
    0101, 0001, 1100.

Before
After
16
Example
  • Insert records with h(K) 0000, 1010, 1111,
    0101, 0001, 1100.

Before
After
Capacity limit exceeded increment n, which
causes incrementing i as well.
17
Example
  • Insert records with h(K) 0000, 1010, 1111,
    0101, 0001, 1100.

Before
After
As long as capacity is not exceeded can add
overflow blocks.
18
Example
  • Insert records with h(K) 0000, 1010, 1111,
    0101, 0001, 1100.

Before
After
Capacity limit exceeded increment n.
19
Lookup in Linear Hash Table
  • For record(s) with search key K, compute h(K)
    search the corresponding bucket according to the
    procedure described for insertion.
  • If the record we wish to look up isnt there, it
    cant be anywhere else.
  • E.g. lookup for a key which hashes to 1010, and
    then for a key which hashes to 1011.

i2
n3
r4
20
Exercise
  • Suppose we want to insert keys with hash values
    00001111 in a linear hash table with 100
    capacity threshold.
  • Assume that a block can hold three records.
Write a Comment
User Comments (0)
About PowerShow.com