Hash table - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Hash table

Description:

betty. 73. 100. 20. 56.8. 81.5. 90. studid. name. score. 9908080. bill. 49. Consider this problem. ... Common errors (page 749) Providing a poor hash function ... – PowerPoint PPT presentation

Number of Views:278
Avg rating:3.0/5.0
Slides: 30
Provided by: phi762
Learn more at: http://www.cs.gsu.edu
Category:
Tags: betty | hash | page | table

less

Transcript and Presenter's Notes

Title: Hash table


1
Hash table
2
Objective
  • To learn
  • Hash function
  • Linear probing
  • Quadratic probing
  • Chained hash table

3
A basic problem
  • We have to store some records and perform the
    following
  • add new record
  • delete record
  • search a record by key
  • Find a way to do these efficiently!

4
Unsorted array
  • Use an array to store the records, in unsorted
    order
  • add - add the records as the last entry fast O(1)
  • delete a target - slow at finding the target,
    fast at filling the hole (just take the last
    entry) O(n)
  • search - sequential search slow O(n)

5
Sorted array
  • Use an array to store the records, keeping them
    in sorted order
  • add - insert the record in proper position. much
    record movement slow O(n)
  • delete a target - how to handle the hole after
    deletion? Much record movement slow O(n)
  • search - binary search fast O(log n)

6
Linked list
  • Store the records in a linked list (sorted /
    unsorted)
  • add - fast if one can insert node anywhere O(1)
  • delete a target - fast at disposing the node, but
    slow at finding the target O(n)
  • search - sequential search slow O(n) (if we only
    use linked list, we cannot use binary search even
    if the list is sorted.)

7
Array as table
studid
name
score
andy
81.5
0012345
0033333
betty
90
0056789
david
56.8
...
9801010
peter
20
9802020
mary
100
...
9903030
tom
73
9908080
bill
49
Consider this problem. We want to store 1000
student records and search them by student id.
8
Array as table
studid
name
score
0
One naive way is to store the records in a huge
array (index 0..9999999). The index is used as
the student id, i.e. the record of the student
with studid 0012345 is stored at A12345



12345
andy
81.5



33333
betty
90



56789
david
56.8






9908080
bill
49



9999999
9
Array as table
  • Store the records in a huge array where the index
    corresponds to the key
  • add - very fast O(1)
  • delete - very fast O(1)
  • search - very fast O(1)
  • But it wastes a lot of memory! Not feasible.

10
Hash function
function Hash(key KeyType) integer
Imagine that we have such a magic function Hash.
It maps the key (stud_id) of the 1000 records
into the integers 0..999, one to one. No two
different keys maps to the same number.
H(0012345) 134 H(0033333) 67 H(0056789)
764 H(9908080) 3
11
Hash table
studid
name
score
0
To store a record, we compute Hash(stud_id) for
the record and store it at the location
Hash(stud_id) of the array. To search for a
student, we only need to peek at the location
Hash(target stud_id).



3
bill
49
9908080



67
betty
90
0033333



134
andy
81.5
0012345



764
david
56.8
0056789



999



12
Hash table with Perfect Hash
  • Such magic function is called perfect hash
  • add - very fast O(1)
  • delete - very fast O(1)
  • search - very fast O(1)
  • But it is generally difficult to design perfect
    hash. (e.g. when the potential key space is large)

13
Hash function
  • A hash function maps a key to an index within in
    a range
  • Desirable properties
  • simple and quick to calculate
  • even distribution, avoid collision as much as
    possible

function Hash(key KeyType)
14
Division Method
h(k) k mod m
  • Certain values of m may not be good
  • Good values for m are prime numbers which are not
    close to exact powers of 2. For example, if you
    want to store 2000 elements then m701 (m hash
    table length) yields a hash function

h(key) k mod 701
15
Collision
  • For most cases, we cannot avoid collision
  • Collision resolution - how to handle when two
    different keys map to the same index

H(0012345) 134 H(0033333) 67 H(0056789)
764 H(9903030) 3 H(9908080) 3
16
Hash Tables
  • The problem arises because we have two keys that
    hash in the same array entry, a collision. There
    are two ways to resolve collision
  • Hashing with Chaining every hash table entry
    contains a pointer to a linked list of keys that
    hash in the same entry
  • Hashing with Open Addressing every hash table
    entry contains only one key. If a new key hashes
    to a table entry which is filled, systematically
    examine other table entries until you find one
    empty entry to place the new key

17
Open Addressing
  • The key is first mapped to a slot
  • If there is a collision subsequent probes are
    performed
  • If the offset constant, c and m are not
    relatively prime, we will not examine all the
    cells. Ex.
  • Consider m4 and c2, then only every other slot
    is checked.
  • When c1 the collision resolution is done as a
    linear search. This is known as linear probing.

18
Linear Probing example1
Insert 89, 18, 49, 58, 9 to table size10,
hash function is tablesize
19
Linear Probing Example-2
  • Single character keys, table size, m8
  • Hash function (map characters to range
    0...7)k APQ BOR CNS DMT ELU
    FKN GJWZ HIXY
  • h1(k) 0 1 2
    3 4 5 6
    7

20
Choosing a Hash Function
  • Notice that the insertion of Q required several
    probes (5). This was caused by A and P mapping
    to slot 0 which is beside the C and D keys.
  • The performance of the hash table depends on a
    having a hash function which evenly distributes
    the keys.
  • The statistics of the key distribution needs to
    be accounted for. For example, choosing the
    first letter of a surname will cause problems
    depending on the nationality of the population
    the variable names in a compiler often differ by
    one character, eg., t1, t2, t3, etc.
  • Consult computer science texts, such as Knuths
    The Art of Computer Programming.

21
Clustering
  • Even with a good hash function, linear probing
    has its problems
  • The position of the initial mapping i 0 of key k
    is called the home position of k.
  • When several insertions map to the same home
    position, they end up placed contiguously in the
    table. This collection of keys with the same
    home position is called a cluster.
  • As clusters grow, the probability that a key will
    map to the middle of a cluster increases,
    increasing the rate of the clusters growth.
    This tendency of linear probing to place items
    together is known as primary clustering.
  • As these clusters grow, they merge with other
    clusters forming even bigger clusters which grow
    even faster.

22
Performance Analysis
  • If n slots in a table of size m are occupied, the
    load factor is defined aswhere ?1 means the
    table is full, and ?0 means the table is empty.
  • It can be shown that the number of probes in a
    successful search, C, and the number of probes in
    an unsuccessful search, C is given by

23
Quadratic Probing
  • h(k)h(k) f(i) ( i0,1,2,)TS
  • h(k)Rmod TS
  • f(i)i2
  • Theorem 20.4 If quadratic probing is used and
    the table size is prime, then a new element can
    always be inserted if the table is at least half
    empty. Furthermore, in the course of the
    insertion, no cell is probed twice.

24
Quadratic probing-example
Insert 89, 18, 49, 58, 9 to table size10,
hash function is tablesize
25
Double Hashing
  • Recall that in open addressing the sequence of
    probes follows
  • We can solve the problem of primary clustering in
    linear probing by having the keys which map to
    the same home position use differing probe
    sequences. In other words, the different values
    for c should be used for different keys.
  • Double hashing refers to the scheme of using
    another hash function for c
  • Note that h1 and h2 need to be evaluated only
    once per key.

26
Chained Hash Table
One way to handle collision is to store the
collided records in a linked list. The array now
stores pointers to such lists. If no key maps to
a certain hash value, that array entry points to
nil.
0
1
nil
2
nil
3
4
nil
5

Key 9903030 name tom score 73
HASHMAX
nil
27
Chained Hash table
  • Hash table, where collided records are stored in
    linked list
  • good hash function, appropriate hash size
  • Few collisions. Add, delete, search very fast
    O(1)
  • otherwise
  • some hash value has a long list of collided
    records..
  • add - just insert at the head fast O(1)
  • delete a target - delete from unsorted linked
    list slow
  • search - sequential search slow O(n)

28
Common errors (page 749)
  • Providing a poor hash function
  • Not rehashing when load factor reaches 0.5.
  • More errors listed on page 749 of the book

29
In class exercises
  • 20.1, 20.2 and 20.5 in the book on page 750.
Write a Comment
User Comments (0)
About PowerShow.com