Hashing Algorithm - PowerPoint PPT Presentation

About This Presentation
Title:

Hashing Algorithm

Description:

The problem at hand is to define and implement a mapping from a domain of keys ... From the compactness standpoint, no application ever stores all keys in a domain ... – PowerPoint PPT presentation

Number of Views:99
Avg rating:3.0/5.0
Slides: 24
Provided by: cpw3
Category:

less

Transcript and Presenter's Notes

Title: Hashing Algorithm


1
Hashing Algorithm
  • 9042635 ???
  • 9142610 ???
  • 9142621 ???

2
Introduction
  • Hashing , a ubiquitous information retrieval
    strategy for providing efficient access to
    information based on a key
  • Information can usually be accessed in constant
    time
  • Hashings drawbacks

3
Concept of hashing
  • The problem at hand is to define and implement a
    mapping from a domain of keys to a domain of
    locations
  • From the performance standpoint, the goal is to
    avoid collisions (A collision occurs when two or
    more keys map to the same location)
  • From the compactness standpoint, no application
    ever stores all keys in a domain simultaneously
    unless the size of the domain is small

4
Concept of hashing (cont)
  • The information to be retrieved is stored in a
    hash table which is best thought of as an array
    of m locations, called buckets
  • The mapping between a key and a bucket is called
    the hash function
  • The time to store and retrieve data is
    proportional to the time to compute the hash
    function

5
Hashing function
  • The ideal function, termed a perfect hash
    function, would distribute all elements across
    the buckets such that no collisions ever occurred
  • h(v) f(v) mod m
  • Knuth(1973) suggests using as the value for m a
    prime number

6
Hashing function(cont)
  • It is usually better to treat v as a sequence of
    bytes and do one of the following for f(v)
  • (1) Sum or multiply all the bytes. Overflow can
    be ignored
  • (2) Use the last (or middle) byte instead of
    the first
  • (3) Use the square of a few of the middle bytes

7
Implementing hashing
  • The following operations are usually provided by
    an implementation of hashing
  • (1) Initialization
  • (2) Insertion
  • (3) Retrieval
  • (4) Deletion

8
Chained hashing
9
Chained hashing(cont)
  • In the worst case (where all n keys map to a
    single location), the average time to locate an
    element will be proportional to n/2.
  • In the best case (where all chains are of equal
    length), the time will be proportional to n/m.

10
Open addressing
11
Minimal perfect hash functions
  • Minimal perfect hash function (MPHF) is a perfect
    hash function with the property that is hashed m
    keys to m buckets with no collisions
  • Cichelli(1980) and of Cercone et al.(1983)
    proposed two important concepts
  • (1)using tables of values as the parameters
  • (2)using a mapping, ordering, and searching
    (MOS) approach

12
Minimal perfect hash functions(cont)
  • Mappingtransform the key set from an original to
    a new universe
  • Orderingplace the keys in a sequence that
    determines the order in which hash values are
    assigned to keys
  • Searchingassign hash values to the keys of each
    level

Mapping ? Ordering ? Searching
13
Sagers method and improvement
  • Sager(1984,1985) formalizes and extends
    Cichellis approach
  • In the mapping step, three auxiliary(hash)
    functions are defined on the original universe of
    keys U
  • h0U? 0 , , m - 1
  • h1U? 0 , , r - 1
  • h2U? r , , 2r 1

14
Sagers method and improvement
  • The class of functions searched is
  • h(k) ( h0(k) g(h1(k)) g(h2(k)) (mod m)
  • Sager uses a graph that represents the
    constraints among keys
  • The mapping step goes from keys to triples to a
    special bipartite graph, the dependency graph,
    whose vertices are the h1(k) and h2(k) values and
    whose edges represent the words

15
Sagers method and improvement
16
The algorithm
  • The mapping step

17
(No Transcript)
18
(No Transcript)
19
The algorithm (cont)
  • The ordering step

20
(No Transcript)
21
The algorithm (cont)
  • The searching step

22
(No Transcript)
23
Discussion
  • Hashing algorithm is a constant-time algorithm,
    and there are always advantages to being able to
    predict the time needed to locate a key
  • The MPHF uses a large amount of space
Write a Comment
User Comments (0)
About PowerShow.com