Hash Tables 1 - PowerPoint PPT Presentation

1 / 79
About This Presentation
Title:

Hash Tables 1

Description:

Symbol table of a compiler. Memory-management tables in operating systems. ... a systematic (consistent) procedure to store elements in free slots of the table. ... – PowerPoint PPT presentation

Number of Views:214
Avg rating:3.0/5.0
Slides: 80
Provided by: cisJu
Category:
Tags: freeslots | hash | tables

less

Transcript and Presenter's Notes

Title: Hash Tables 1


1
Hash Tables 1
2
Dictionary
  • Dictionary
  • Dynamic-set data structure for storing items
    indexed using keys.
  • Supports operations Insert, Search, and Delete.
  • Applications
  • Symbol table of a compiler.
  • Memory-management tables in operating systems.
  • Large-scale distributed systems.
  • Hash Tables
  • Effective way of implementing dictionaries.
  • Generalization of ordinary arrays.

3
Direct-address Tables
  • Direct-address Tables are ordinary arrays.
  • Facilitate direct addressing.
  • Element whose key is k is obtained by indexing
    into the kth position of the array.
  • Applicable when we can afford to allocate an
    array with one position for every possible key.
  • i.e. when the universe of keys U is small.
  • Dictionary operations can be implemented to take
    O(1) time.
  • Details in Sec. 11.1.

4
Hash Tables
  • Notation
  • U Universe of all possible keys.
  • K Set of keys actually stored in the
    dictionary.
  • K n.
  • When U is very large,
  • Arrays are not practical.
  • K ltlt U.
  • Use a table of size proportional to K The
    hash tables.
  • However, we lose the direct-addressing ability.
  • Define functions that map keys to slots of the
    hash table.

5
Hash Tables
  • Let universe of keys U and an array of size m. A
    hash function h is a function from U to 0m, that
    is h U 0m

(universe of keys)
0 1 2 3 4 5 6 7
U
k1 k2 k3 k4 k6
h (k2)2
h (k1)h (k3)3
h (k6)5
h (k4)7
6
Hash Tables Example
For example, if we hash keys 01000 into a hash
table with 5 entries and use h(key) key mod 5 ,
we get the following sequence of events
Insert 21
Insert 54
There is a collision at array entry 4
21
2
2
???
7
Hashing
  • Hash function h Mapping from U to the slots of a
    hash table T0..m1.
  • h U ? 0,1,, m1
  • With arrays, key k maps to slot Ak.
  • With hash tables, key k maps or hashes to slot
    Thk.
  • hk is the hash value of key k.

8
Hashing
0
U (universe of keys)
h(k1)
h(k4)
k1
K (actual keys)
k4
k2
collision
h(k2)h(k5)
k5
k3
h(k3)
m1
9
Issues with Hashing
  • Multiple keys can hash to the same slot
    collisions are possible.
  • Design hash functions such that collisions are
    minimized.
  • But avoiding collisions is impossible.
  • Design collision-resolution techniques.
  • Search will cost ?(n) time in the worst case.
  • However, all operations can be made to have an
    expected complexity of ?(1).

10
Methods of Resolution
  • Chaining
  • Store all elements that hash to the same slot in
    a linked list.
  • Store a pointer to the head of the linked list in
    the hash table slot.
  • Open Addressing
  • All elements stored in hash table itself.
  • When collisions occur, use a systematic
    (consistent) procedure to store elements in free
    slots of the table.

0
k1
k4
k2
k5
k6
k7
k3
k8
m1
11
Collision Resolution by Chaining
0
U (universe of keys)
h(k1)h(k4)
X
k1
k4
K (actual keys)
k2
X
h(k2)h(k5)h(k6)
k6
k5
k7
k8
k3
X
h(k3)h(k7)
h(k8)
m1
12
Collision Resolution by Chaining
0
U (universe of keys)
k1
k4
k1
k4
K (actual keys)
k2
k2
k6
k5
k6
k5
k7
k8
k3
k7
k3
k8
m1
13
Hashing with Chaining
  • What is the running time to insert/search/delete?
  • Insert It takes O(1) time to compute the hash
    function and insert at head of linked list
  • Search It is proportional to max linked list
    length
  • Delete Same as search
  • Therefore, in the unfortunate event that we have
    a bad hash function all n keys may hash in the
    same table entry giving an O(n) run-time!
  • So how can we create a good hash function?

14
Hashing with Chaining
  • Dictionary Operations
  • Chained-Hash-Insert (T, x)
  • Insert x at the head of list Th(keyx).
  • Worst-case complexity O(1).
  • Chained-Hash-Delete (T, x)
  • Delete x from the list Th(keyx).
  • Worst-case complexity proportional to length of
    list with singly-linked lists. O(1) with
    doubly-linked lists.
  • Chained-Hash-Search (T, k)
  • Search an element with key k in list Th(k).
  • Worst-case complexity proportional to length of
    list.

15
Analysis on Chained-Hash-Search
  • Load factor ?n/m average keys per slot.
  • m number of slots.
  • n number of elements stored in the hash table.
  • Worst-case complexity ?(n) time to compute
    h(k).
  • Average depends on how h distributes keys among m
    slots.
  • Assume
  • Simple uniform hashing.
  • Any key is equally likely to hash into any of the
    m slots, independent of where any other key
    hashes to.
  • O(1) time to compute h(k).
  • Time to search for an element with key k is
    Q(Th(k)).
  • Expected length of a linked list load factor
    ? n/m.

16
Expected Cost of an Unsuccessful Search
Theorem An unsuccessful search takes expected
time T(1a).
  • Proof
  • Any key not already in the table is equally
    likely to hash to any of the m slots.
  • To search unsuccessfully for any key k, need to
    search to the end of the list Th(k), whose
    expected length is a.
  • Adding the time to compute the hash function, the
    total time required is T(1a).

17
Expected Cost of a Successful Search
Theorem A successful search takes expected time
T(1a).
  • Proof
  • The probability that a list is searched is
    proportional to the number of elements it
    contains.
  • Assume that the element being searched for is
    equally likely to be any of the n elements in the
    table.
  • The number of elements examined during a
    successful search for an element x is 1 more than
    the number of elements that appear before x in
    xs list.
  • These are the elements inserted after x was
    inserted.
  • Goal
  • Find the average, over the n elements x in the
    table, of how many elements were inserted into
    xs list after x was inserted.

18
Expected Cost of a Successful Search
Theorem A successful search takes expected time
T(1a).
  • Proof (contd)
  • Let xi be the ith element inserted into the
    table, and let ki keyxi.
  • Define indicator random variables Xij Ih(ki)
    h(kj), for all i, j.
  • Simple uniform hashing ? Prh(ki) h(kj) 1/m
  • ?
    EXij 1/m.
  • Expected number of elements examined in a
    successful search is

No. of elements inserted after xi into the same
slot as xi.
19
Proof Contd.
(linearity of expectation)
Expected total time for a successful search
Time to compute hash function Time to search
O(2?/2 ?/2n) O(1 ?).
20
Expected Cost Interpretation
  • If n O(m), then ?n/m O(m)/m O(1).
  • ? Searching takes constant time on average.
  • Insertion is O(1) in the worst case.
  • Deletion takes O(1) worst-case time when lists
    are doubly linked.
  • Hence, all dictionary operations take O(1) time
    on average with hash tables with chaining.

21
Good Hash Functions
  • Satisfy the assumption of simple uniform hashing.
  • Not possible to satisfy the assumption in
    practice.
  • Often use heuristics, based on the domain of the
    keys, to create a hash function that performs
    well.
  • Regularity in key distribution should not affect
    uniformity. Hash value should be independent of
    any patterns that might exist in the data.
  • E.g. Each key is drawn independently from U
    according to a probability distribution P
  • ?kh(k) j P(k) 1/m for j 0, 1, , m1.
  • An example is the division method.

22
Keys as Natural Numbers
  • Hash functions assume that the keys are natural
    numbers.
  • When they are not, have to interpret them as
    natural numbers.
  • Example Interpret a character string as an
    integer expressed in some radix notation. Suppose
    the string is CLRS
  • ASCII values C67, L76, R82, S83.
  • There are 128 basic ASCII values.
  • So, CLRS 67128376 1282 821281 831280
    141,764,947.

23
Division Method
  • Map a key k into one of the m slots by taking the
    remainder of k divided by m. That is,
  • h(k) k mod m
  • Example m 31 and k 78 ? h(k) 16.
  • Advantage Fast, since requires just one division
    operation.
  • Disadvantage Have to avoid certain values of m.
  • Dont pick certain values, such as m2p
  • Or hash wont depend on all bits of k.
  • Good choice for m
  • Primes, not too close to power of 2 (or 10) are
    good.

24
Multiplication Method
  • If 0 lt A lt 1, h(k) ?m (kA mod 1)? ?m (kA
    ?kA?) ?
  • where kA mod 1 means the fractional part of
    kA, i.e., kA ?kA?.
  • Disadvantage Slower than the division method.
  • Advantage Value of m is not critical.
  • Typically chosen as a power of 2, i.e., m 2p,
    which makes implementation easy.
  • Example m 1000, k 123, A ? 0.6180339887
  • h(k) ?1000(123 0.6180339887 mod 1)?
  • ?1000 0.018169... ? 18.

25
Multiplication Mthd. Implementation
  • Choose m 2p, for some integer p.
  • Let the word size of the machine be w bits.
  • Assume that k fits into a single word. (k takes w
    bits.)
  • Let 0 lt s lt 2w. (s takes w bits.)
  • Restrict A to be of the form s/2w.
  • Let k ? s r1 2w r0 .
  • r1 holds the integer part of kA (?kA?) and r0
    holds the fractional part of kA (kA mod 1 kA
    ?kA?).
  • We dont care about the integer part of kA.
  • So, just use r0, and forget about r1.

26
Multiplication Mthd Implementation
w bits
k
s A2w
?
binary point

r0
r1
extract p bits
h(k)
  • We want ?m (kA mod 1)?. We could get that by
    shifting r0 to the left by p lg m bits and then
    taking the p bits that were shifted to the left
    of the binary point.
  • But, we dont need to shift. Just take the p most
    significant bits of r0.

27
How to choose A?
  • How to choose A?
  • The multiplication method works with any legal
    value of A.
  • But it works better with some values than with
    others, depending on the keys being hashed.
  • Knuth suggests using A ? (?5 1)/2.

28
Multiplication Method
  • We choose m to be power of 2 (m2p) and
  • For example, k123456, m512 then

29
Multiplication Method Implementation
30
Drawback of Chaining
  • Drawback of Separate Chaining
  • new operator takes long time to allocate memory
    in some languages
  • We are basically using two data structures an
    array and a list
  • Therefore, separate chained hash tables although
    useful are not used widely

31
Open Addressing
  • Open Addressing means that when collision occurs
    at a certain location, we try alternate locations
    until an empty location is found
  • As opposed to separate chaining, we now maintain
    only one table (array). There are no associated
    lists at each array index
  • Alternate locations are found by using a
    collision resolution strategy. It is denoted by a
    function f().

32
Hash Functions Using Collision Resolution
Strategy
  • Using a collision resolution strategy, the hash
    function gets modified to hi(x)
  • hi(x) (hash(x)f(i)) mod tableSize.
  • Here
  • hi(x) new hash function
  • hash(x) old hash function, probably something
    like hash(x) x mod tableSize
  • f(i) collision resolution strategy

33
Collision Resolution Strategy (contd.)
  • i denotes the number of attempts made by the
    collision resolution strategy. When a collision
    occurs and we try to find an empty location
    (using the collision resolution strategy) for the
    first time, then i1. If this first attempt fails
    we again try to find an empty location for the
    second time round, at which time i2, and so on
  • You must have noticed here that the collision
    resolution strategy should be a function of i
    (the number of the attempt). That is why the
    collision resolution function is denoted as f(i)

34
Hash Tables and Collision Resolution
  • Some characteristics of hash tables with
    collision resolution
  • All data goes inside table so a larger table is
    required
  • l 0.5 for open addressing
  • We will now investigate different collision
    resolution strategies. In other words, we will
    take various functions for f(i) and see how the
    hash table performs

35
Collision Resolution Strategy 1 Linear Probing
  • In linear probing f is linear. f(i) i
  • This means that when there is a collision we try
    successive locations starting from the location
    of collision until we find an empty location

36
Linear Probing Example
  • Example Insert the following data into a hash
    table using linear probing as the collision
    resolution strategy. Assume tableSize 10
  • 17 26 38 9 7 66 11
  • Unless otherwise stated, we will assume that the
    original hash function is
  • hash(x) x mod tableSize x mod 10
  • Since we are using linear probing, we have f(i)
    i
  • Let us now compute hi(x) for each of the input
    data and place them inside the array

37
Linear Probing Example (contd.)
  • h0(17) hash(17) f(0) (17 mod 10) 0 7
    (Remember that f(0) 0)
  • Location 7 is currently empty. So there is no
    collision and 17 is entered into the table
  • Similarly, 26, 38 and 9 do not create any
    collisions and are entered into the table
  • The diagram of the table after these four
    insertions is shown on the next slide

38
Linear Probing Example (contd.)
Array Index
39
Linear Probing Example (contd.)
  • The next data to be inserted is 7
  • h0(7) hash(7) f(0) 7 mod 10 7. The index
    7 inside the array is already occupied by 17.
  • So, we have a collision and we have to use the
    collision resolution strategy to find an empty
    location to insert 7
  • Now, since this is our first attempt to find an
    empty location, so i1
  • Since we are using linear probing, f(i)i, so
    f(1) 1 and h1(7) (hash(7) 1) mod 10 (7
    1) mod 10 8 mod 10 8

40
Linear Probing Example (contd.)
  • However, location 8 is already occupied by 38
  • So we have to use collision resolution once
    again, now with i2
  • Since we are using linear probing, f(i)i, so
    f(2) 2 and h2(7) (hash(7) 2) mod 10 (7
    2) mod 10 9 mod 10 9

41
Linear Probing Example (contd.)
  • However, location 9 is also occupied by 9
  • So we have to use collision resolution once
    again, now with i3
  • Since we are using linear probing, f(i)i, so
    f(3) 3 and h3(7) (hash(7) 3) mod 10 (7
    3) mod 10 10 mod 10 0
  • Location 0 is empty and so, we insert 7 at index 0

42
Linear Probing Example (contd.)
  • The next data to be inserted is 66
  • h0(66) hash(66) f(0) 66 mod 10 6
  • Location 6 is already occupied by 26. So we get a
    collision
  • We have to use the collision resolution strategy
    with linear probing as we did while inserting 7
  • Solve this as we did on the last few slides with
    insert of 7

43
Linear Probing Example (contd.)
  • 66 will collide 5 times and will get inserted at
    location 1
  • Next, we have to insert 11
  • Once again we get a collision but 11 can be
    inserted after the first collision
  • Verify the insertion of 11 into the hash table as
    we did in the example before
  • The diagram of the hash table after all the
    inserts is given on the next slide

44
Linear Probing Example (contd.)
Array Index
45
Drawbacks of Linear Probing
  • Time to find empty cell is quite large. For
    example, we had to inspect 4 locations before we
    found an empty location to insert 7. Same problem
    was encountered while inserting 66.
  • The hash table can be relatively empty and
    pockets of occupied cells start forming. For
    example, when we inserted 66 the lower part of
    the hash table was full but the upper part was
    entirely empty
  • Primary Clustering Several attempts required to
    resolve a collision. For example, while inserting
    66 which collided as many as 5 times

46
Collision Resolution Strategy 2 Quadratic
Probing
  • In quadratic probing f(i)i2. All other
    techniques remain similar to linear probing
  • Example Insert the following data into a hash
    table using quadratic probing as the collision
    resolution strategy. Assume tableSize 10
  • 17 26 38 9 7 66 11
  • As in the previous example, 17, 26, 38 and 9 get
    inserted without any collisions

47
Quadratic Probing Example
  • When we try to insert 7, hash (7) 7 mod 10 7.
    Location 7 is already occupied and so we get a
    collision
  • We should now try to find an empty location using
    the collision resolution strategy of quadratic
    probing. Since we are trying to find an empty
    location for the first time, i1

48
Quadratic Probing Example (contd.)
  • Since we are using quadratic probing now,
    f(i)i2, so f(1) 12 1 and h1(7) (hash(7)
    1) mod 10 (7 1) mod 10 8 mod 10 8
  • Location 8 is already occupied, so we try another
    collision resolution now with i2
  • Since we are using quadratic probing now,
    f(i)i2, so f(2) 22 4 and h2(7) (hash(7)
    4) mod 10 (7 4) mod 10 11 mod 10 1

49
Quadratic Probing Example (contd.)
  • Location 1 is empty and so we insert 7
  • Notice that we had much less collisions while
    inserting 7 with quadratic probing than we had
    with linear probing

50
Quadratic Probing Example (contd.)
  • Let us now try to insert the next data 66
  • We get a collision at location 6 and we use
    quadratic probing to find empty cells
  • The cells that will be probed by quadratic
    probing are
  • with i1, location 7 gives a collision again
  • with i2, location 0 empty 66 is inserted here
  • Once again notice that we had fewer collisions
    now than we had with linear probing

51
Quadratic Probing Example (contd.)
  • Please solve the insertion of 11 by yourself
  • The diagram of the hash table after all the data
    has been inserted is given on the next slide

52
Quadratic Probing Example (contd.)
Array Index
66
7
11
53
Quadratic Probing Problem 1
  • There is no guarantee to find empty cell once
    table is more than half full (see proof on page
    92. This proof is not required for the exam)

54
Quadratic Probing Problem 2
  • Standard deletion cannot be used
  • To understand this, let us see how we find 66 in
    the hash table given on slide 25
  • hash(66) 66 mod 10 6
  • Location 6 contains 26 which is not the data we
    are finding
  • This means that
  • either 66 is not there in the entire hash table
  • or, 66 got stored somewhere else when we used
    quadratic probing to find an empty location while
    inserting it

55
Quadratic Probing Problem 2 (contd.)
  • Since we just solved this example, we know that
    the second option is what actually happened
  • However, the find routine does not know that this
    is what happened
  • So, the find method has to visit each location
    that might have been visited by quadratic probing
    while inserting 66

56
Quadratic Probing Problem 2 (contd.)
  • These locations would be at a distance 1 or 4 or
    9 or 16 or 25 (everything being mod tableSize)
    away from location 6 (6 is the value returned by
    hash(66))
  • Notice that the above distances are at a distance
    i2 from location 6 because quadratic probing uses
    f(i) i2 for i1, 2, 3 and so on
  • So the find method looks at location (61) mod 10
    7 and does not find 66
  • Next the find method looks at location (64) mod
    10 10 mod 10 0 and finds 66

57
Quadratic Probing Problem 2 (contd.)
  • However, what would have happened if we deleted
    26 first and then we tried to find 66
  • Since hash(66) 6 mod 10 6, and location 6 is
    empty, the find method would have (wrongly)
    assumed that 66 is not there because the first
    location where 66 could be inserted (location 6)
    is free. So 66 never got inserted

58
Quadratic Probing Problem 2 (contd.)
  • The solution is to use a technique called lazy
    delete
  • In lazy delete, along with each location we
    maintain a tag that is initially cleared
  • When there is a collision while inserting the tag
    is set
  • Then quadratic (or some other) probing is used to
    locate an empty cell and insert the data

59
Quadratic Probing Problem 2 (contd.)
  • With lazy delete, when we insert 66, we get a
    collision at location 6 (occupied by 26) and we
    set the tag for location 6
  • Later on if we delete 26, the tag still remains
    set
  • Now, when encounters an empty location at
    location 6 it checks to see if the tag is set
  • Since the tag is set, find knows that there is
    another data that should have been in location 6
    but got bumped off to another location by the
    collision resolution strategy

60
Quadratic Probing Problem 3
  • Suppose that collision occurs while inserting at
    location x
  • Then the locations that will be probed using
    quadratic probing are (x1) mod 10, (x4) mod 10,
    (x9) mod 10, (x16) mod 10, (x25) mod 10 and so
    on

61
Quadratic Probing Problem 3 (contd.)
  • Let us substitute a value for x, say x 5
  • So successive locations that are probed by
    quadratic probing until an empty location is
    found are
  • (51) mod 10 6
  • (54) mod 10 9
  • (516) mod 10 21 mod 10 1
  • (525) mod 10 30 mod 10 0
  • (536) mod 10 41 mod 10 1
  • (549) mod 10 54 mod 10 4
  • (564) mod 10 69 mod 10 9
  • (581) mod 10 86 mod 10 6 and so on
  • Notice that some locations (1, 6, 9) are getting
    probed repeatedly

62
Secondary Clustering
  • This problem is called secondary clustering
  • Secondary Clustering Elements that hash to same
    locations always probe same set of cells
  • This is solved by using the last collision
    resolution strategy we are going to study double
    hashing

63
Collision Resolution Strategy 3 Double Hashing
  • Here the probing function f(i) i hash2(x)
  • hash2(x) is called the secondary hash function
  • However, a bad choice of hash2(x) can really make
    matters worse
  • Let us assume that tableSize 10 as we had in
    the previous examples

64
Bad Choice for hash2(x)
  • For example if hash2(x) x mod 7 and we try
    insert 7
  • hash(7) 7 mod 10 7. Suppose that location 7
    is already occupied and so there is a collision
  • Now we use our collision resolution strategy with
    i1, f(i) ihash2(x). So f(1) 1 hash2(7)
    1 (7 mod 7) 0
  • Therefore, h1(7) 7
  • In fact, f(2) also equals 0 and so h2(7) 7
  • So, we are not going anywhere and repeatedly
    probing location 7

65
Good Choice for hash2(x)
  • An example of a good hash function is hash2(x) R
    - (x mod R) where R is prime number lt tableSize.
  • If tableSize 10 (as is in our example), R 7
    is a good choice

66
Double Hashing Example
  • Example Insert the following data into a hash
    table using double hashing as the collision
    resolution strategy
  • 89 18 49 58 68
  • 89 and 18 do not create any collisions and get
    inserted at locations 9 and 8 respectively
  • h0(49) (hash(49) f(0)) mod 10 (49 mod 10
    0) mod 10 9. Location 9 is already occupied so
    we get a collision

67
Double Hashing Example (contd.)
  • hash2 (49) 7 (49 mod 7) 7 0 7
  • So,
  • h1(49) (hash(49) f(1)) mod 10
  • (49 mod 10 1 hash2 (49) ) mod 10
  • (9 7) mod 10 16 mod 10 6
  • Location 6 is empty and 49 is inserted there

68
Double Hashing Example (contd.)
  • 58 and 49 also collide when we try to insert them
    and the collision is resolved at the first
    attempt (with i 1) using double hashing
  • Verify that hash2(58) 7 (58 mod 7) 7 2
    5 and that 58 gets inserted at location 3
  • Verify that hash2(69) 7 (69 mod 7) 7 6
    1 and that 69 gets inserted at location 0
  • The hash table after all insertions is shown on
    the next slide

69
Double Hashing Example Figure
Array Index
69
58
70
Double Hashing Problem
  • To understand the problem let us suppose that we
    are inserting 23 into the hash table on the last
    slide
  • We get collision at position 3 which is already
    occupied by 58
  • Since we are using double hashing, hash2(23) 7
    (23 mod 7) 7 2 5
  • So, h1(23) (hash(23) 1hash2(23)) mod 10 (3
    1 5) mod 10 8 mod 10 8
  • Position 8 also occupied

71
Double Hashing Problem (contd.)
  • So we try to find an empty space again using
    double hashing
  • h2(23) (hash(23) 2hash2(23)) mod 10 (3
    2 5) mod 10 13 mod 10 3
  • Location 3 is already occupied
  • We again try to find an empty space using double
    hashing
  • h3(23) (hash(23) 3hash2(23)) mod 10 (3
    3 5) mod 10 18 mod 10 8
  • Location 8 is already occupied and had already
    been probed while doing h1(23)

72
Double Hashing Problem (contd.)
  • In fact, if you try out further attempts with
    i3, 4, 5 and so on you will see that locations 3
    and 8 get continuously probed
  • The reason for this is that tableSize 10 is not
    a prime
  • The solution for this problem is to make
    tableSize prime (for e.g. 11 is a good choice for
    tableSize)

73
Double Hashing Ideal Secondary Hash Function
  • A properly selected secondary hash function
    hash2(x) ensures that number of expected probes
    is close to a random collision resolution
    strategy

74
Double Hashing vs.Linear and Quadratic Probing
  • As compared to double hashing, linear and
    quadratic probing are faster because f(i)
    ihash2(x) takes longer to compute than f(i) i
    or f(i) i2

75
Rehashing
  • Rehashing tells us what to do when the hash table
    gets full
  • Instead of waiting for the hash table to get
    completely full, it is more efficient to rehash
    when the table is about 70 or 80 full
  • The most common rehashing technique is to
    construct a new table of approximately double
    size of the original hash table
  • Since the new table has a different size,
    tableSize gets a new value, and, so a new hash
    function, hash(x) x mod (new_tableSize) has to
    be defined

76
Rehashing Example
  • Example Insert 13 15 6 24 23 into an
    initially empty hash table. Assume tableSize 7
    and use linear probing for collision resolution
    (The table is drawn in the book on page 198-199.
    Please see it)
  • Since tableSize 7, hash(x) x mod 7
  • After 23 is inserted, the hash table is 70 full
  • Rehash New table size 72 14. 14 is not a
    prime number
  • So, we select the prime number closest to and
    greater than 14, i.e., 17 as the new tableSize
  • The new hash function is now hash(x) x mod 17

77
Rehashing (contd.)
  • All the data from the original table has to be
    inserted into the new table at their new location
    given by the new hash function (See page 199 from
    book for the diagram)
  • Rehashing is a costly operation and it happens
    frequently when the hash table is small and there
    are a lot of insertions
  • The time required for rehashing is O(N) since N
    elements need to be rehashed from the original
    hash table into the new one
  • Therefore it adds a constant cost to each
    insertion

78
Other Rehashing Techniques
  • Rehash when table is half full
  • Rehash as soon as insertion fails
  • Rehash beyond a certain load factor l
  • Technique 2 above gives best results since
    performance degrades as l increases.

79
Advantages of Rehashing
  • Frees programmer from worrying about tableSize
    while inserting data
  • Hash tables cannot be made arbitrarily large to
    start with in complex programs
  • Rehashing can be used for other data structures
    as well (e.g. queue)
Write a Comment
User Comments (0)
About PowerShow.com