Hashing - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Hashing

Description:

Hashing Chapter 20 – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 26
Provided by: vald160
Category:
Tags: hashing

less

Transcript and Presenter's Notes

Title: Hashing


1
Hashing
  • Chapter 20

2
Hash Table
  • A hash table is a data structure that allows fast
    find, insert, and delete operations (most of the
    time).
  • The simplest way of implementation a hash table
    is an array.
  • Example Suppose we want to store integers in the
    range of 0- 65535.

3
  • Create any array a of size 65536 with indices
    in the range of 0-65535 and initialize the array
    with all zeros.
  • insert (i) -- ai
  • find (i) -- Is ai gt 0 ?
  • remove (i) -- if ( ai gt 0) ai --

4
  • If our keys are 8-letter alphabetic words, there
    are 268 or about 200 billion possible keys about
    200 gig of keys.
  • Only a small fraction of these keys will actually
    occur
  • Conceptually, a very large array, with very few
    cells occupied.
  • We need a better way

5
  • Allow many of the different possible keys which
    can occur to be mapped to the same location under
    the action of the index function.
  • A hash function takes a key and maps it to some
    index (possibly a smaller index) in the array.
  • A collision occurs when the hash function maps
    two actual keys to the same index.

6
Hash table operations
  • Given a hash function hash(key) which returns an
    integer The simplistic approach is as follows
  • insert(key) Ahash(key) object to insert
  • find(key) is the object at Ahash(key) ?
  • remove(key) remove object at Ahash(key)
  • But what happens on collisions?

7
Choosing a hash function
  • Desirable properties of a good hash function
  • quick and easy to compute
  • uniformly distributes the keys over the range of
    indices
  • minimizes collisions

8
Methods of building a hash function
  • Truncation
  • ignore part of the key and use the remaining part
    as the index
  • Folding
  • Partition the key into several parts and combine
    these parts to obtain the index
  • Modular arithmetic
  • Convert the key to an integer and mod by the
    table size

9
Example
  • Given 8-digit integers and a table of size 1000
  • Truncation
  • e.g. -- use the 4th, 7th and 8th digits to form
    the index hash(62538194) 394
  • Folding
  • e.g. -- break into groups of 3, 3, and 2 digits,
    add the parts and truncate if necessary
    hash(62538194) (62538194) mod 1000 1100 mod
    1000 100

10
Example contd
  • Modular arithmetic
  • e.g. Simply mod by the table size hash(62538194)
    62538194 mod 1000 194
  • It seems best to have a table size which is a
    prime number for modular arithmetic, so a table
    size of 997 or 1009 would perform a little
    better.
  • A combination of these techniques may be even
    better

11
Collision Resolution
  • Open addressing
  • The table is an array which holds at most one
    object per index -- contiguous storage
  • Chaining
  • The table is an array of chains, all elements on
    a chain have the same index these chains are
    sometimes called buckets -- dynamic storage

12
Open Addressing
  • Linear probing
  • This is the simplest method of collision
    resolution
  • Start with the hash index and perform a linear
    search for the desired key or an empty location
  • The table is considered circular, the search
    wraps around from the last index to the first

13
Open Addressing
  • Clustering
  • The major drawback of linear probing is that when
    the table becomes about half full, these is a
    tendency toward clustering
  • Clustering occurs when records start to appear to
    as long strings of adjacent positions, which may
    have several different hash values
  • Linear searches for empty locations become longer
    and longer

14
An Example
  • Insert the items 67, 89, 17, 20, 90, 19 into an
    empty hash table using an array of size 10 and
    using the following hash function
  • hash (key) key mod 10.
  • Use linear probing to handle collisions.

15
Open Addressing
  • Other techniques of collision resolution
  • Rehashing
  • use a second hash function to find an alternative
    position
  • Quadratic Probing
  • if hash(key) h, probe at locations h1, h4,
    h9, h16, etc. i.e., locations hi2 for i
    1,2,3,4,
  • Random Probing
  • use a seeded pseudo-random number generator to
    obtain the increment

16
Open Addressing
  • Deletions
  • deletions with open addressing is awkward. (why?)
  • lazy deletion is the preferred means that is,
    making items as deleted rather than physically
    removing them from the table.

17
Chaining
  • Advantages to linked storage
  • with a good hash function, the linked lists will
    be short
  • clustering is not a problem -- records with
    different keys are on different chains
  • The size of the table is of less concern
  • Deletions are easy and efficient
  • The chains could be binary search trees or other
    structures

18
Load Factor
  • The load factor of a hash table is the ratio of
    the number of items in the table to size of the
    hash table
  • n - the number of items in the table
  • t - the size of the hash table
  • the load factor ? n/t
  • ? 0 indicates an empty table
  • ? 0.5 indicates a table half full

19
Load Factor
  • In open addressing, ? may never exceed 1, and in
    practice, ? gt 0.5 will begin to cause clustering
    problems.
  • In chaining, there is no limit to the size of ?.

20
Linear Probing
  • Theorem The average number of cells examined in
    an insertion using linear probing is
  • 1 1/(1 k)2/2 where k is the load factor.
  • Theorem The average number of cells examined in
    a successful search is approximately
  • 1 1/(1 k)/2 where k is the load factor.

21
Quadratic Probing
  • Note that in linear probing, each probe tries a
    different cell. Does quadratic probing
    guarantees that, when a cell is tried, we have
    not already tried it during the course of the
    current access? Does quadratic probing guarantees
    that, when we are inserting x and the table is
    not full, x will be inserted?

22
Quadratic Probing
  • Theorem If quadratic probing is used and the
    table size is prime, then a new element can
    always be inserted if the table is at least half
    empty. Furthermore, in the course of the
    insertion, no cell is probed twice.

23
Hash Table Vs. BST
  • Insert and find operations can be implemented
  • using a BST with average insert/find time of
  • O(logn). However, a BST is generally a more
  • powerful data structure than a hash table as it
  • can easily support routines that require order,
  • for example, finding the smallest/largest
  • element.

24
Hash Table VS. BST
  • If the input is sorted, a BST will perform
    poorly. Although balanced trees can be used to
    avoid the O (n) time insert/find, they are quite
    expensive to implement. Hence, if no ordering
    information is required and there is any
    suspicion that the input might be sorted, hashing
    is the data structure of choice.

25
Applications of Hash Tables
  • Hash tables are used in implementing
  • Symbol Tables
  • Game Programs
  • Spelling Checkers
  • HW Problems 20.1-20.6 on page 710
Write a Comment
User Comments (0)
About PowerShow.com