Hash Table - PowerPoint PPT Presentation

About This Presentation

Hash Table


Chapter 12 Hash Table – PowerPoint PPT presentation

Number of Views:289
Avg rating:3.0/5.0
Slides: 49
Provided by: Darw2
Tags: hash | hashing | table


Transcript and Presenter's Notes

Title: Hash Table

Chapter 12
  • Hash Table

Hash Table
  • So far, the best worst-case time for searching is
    O(log n).
  • Hash tables
  • average search time of O(1).
  • worst case search time of O(n).

Learning Objectives
  • Develop the motivation for hashing.
  • Study hash functions.
  • Understand collision resolution and compare and
    contrast various collision resolution schemes.
  • Summarize the average running times for hashing
    under various collision resolution schemes.
  • Explore the java.util.HashMap class.

12.1 Motivation
  • Let's design a data structure using an array for
    which the indices could be the keys of entries.
  • Suppose we wanted to store the keys 1, 3, 5, 8,
    10, with a guaranteed one-step access to any of

12.1 Motivation
  • The space consumption does not depend on the
    actual number of entries stored.
  • It depends on the range of keys.
  • What if we wanted to store strings?
  • For each string, we would first have to compute a
    numeric key that is equivalent to it.
  • java.lang.String.hashCode() computes the numeric
    equivalent (or hashcode) of a string by an
    arithmetic manipulation involving its individual

12.1 Motivation
  • Using numeric keys directly as indices is out of
    the question for most applications.
  • There isn't enough space

12.1 Motivation
12.2 Hashing
  • A simple hash function
  • table size of 10
  • h(k) k mod 10

12.2 Hashing
  • ear collides with cat at position 4.
  • There is empty space in the table, and it is up
    to the collision resolution scheme to find an
    appropriate position for this string.
  • A better mapping function
  • For any hash function one could devise, there are
    always hashcodes that could force the mapping
    function to be ineffective by generating lots of

12.2 Hashing
12.3 Collision Resolution
  • There are two ways to resolve collisions.
  • open addressing
  • Find another location for the colliding key
    within the hash table.
  • closed addressing
  • store all keys that hash to the same location in
    a data structure that hangs off that location.

12.3.1 Linear Probing
12.3.1 Linear Probing
  • As more and more entries are hashed into the
    table, they tend to form clusters that get bigger
    and bigger.
  • The number of probes on collisions gradually
    increases, thus slowing down the hash time to a

12.3.1 Linear Probing
  • Insert "cat", "ear", "sad", and "aid"

12.3.1 Linear Probing
  • Clustering is the downfall of linear probing, so
    we need to look to another method of collision
    resolution that avoids clustering.

12.3.2 Quadratic Probing
12.3.2 Quadratic Probing
  • Avoids Clustering
  • When the probing stops with a failure to find an
    empty spot, as many as half the locations of the
    table may still be unoccupied.
  • A hash to 2,3,6,0,7, and 5 are endlessly
    repeated, and an insertion is not done, even
    though half the table is empty.

12.3.2 Quadratic Probing
  • For any given prime N, once a location is
    examined twice, all locations that are examined
    thereafter are also ones that have been already

12.3.3 Chaining
  • If a collision occurs at location i of the hash
    table, it simply adds the colliding entry to a
    linked list that is built at that location.

Running times
  • We assume that the hashing process itself
    (hashcode and mapping) takes O(1).
  • Running time of insertion is determined by the
    collision resolution scheme.

12.4 The java.util.HashMap Class
  • Consider a university-wide database that stores
    student records.
  • Every student is assigned a unique id (key), with
    which is associated several pieces of information
    such as name, address, credits, gpa, etc.
  • These pieces of information constitute the value.

12.4 The java.util.HashMap Class
  • A StudentInfo dictionary that stores (id, info)
    pairs for all the students enrolled in the
  • The operations corresponding to this relationship
    can be found in hava.util.MapltK,Vgt

12.4 The java.util.HashMap Class
  • The Map interface also provides operations to
    enumerate all the keys, enumerate all the values,
    get the size of the dictionary, check whether the
    dictionary is empty, and so on.
  • The java.util.HashMap implements the dictionary
    abstraction as specified by the java.util.Map
    interface. It resolves collisions using chaining.

12.4.1 Table and Load Factor
  • When the no-arg constructor is used
  • Default initial capacity 16
  • Default load factor of 0.75.
  • The table size is defined as the actual number of
    key-value mappings in the has table.

12.4.1 Table and Load Factor
  • We can choose an initial capacity
  • Only uses capacities that are powers of 2.
  • 101 becomes 128

12.4.1 Table and Load Factor
  • An initial capacity of 128.

12.4.2 Storage of Entries
  • Relevant fields in the HashMap class.
  • threshold is the size threshold
  • Product of the capacity and the threshold load
    factor (N t)

12.4.2 Storage of Entries
  • Entry table sets up an array of chains.
  • Map.EntryltK,Vgt is defined inside the MapltK,Vgt
  • next holds a reference to the next Entry in its
    linked list.

12.4.3 Adding an Entry
  • Example
  • Name serves as a key to the phone number value.

12.4.3 Adding an Entry
12.4.3 Adding an Entry
  • If the key argument is null, a special object,
    NULL_KEY is returned, otherwise the argument key
    is returned as is.

12.4.3 Adding an Entry
12.4.3 Adding an Entry
  • Example
  • h 25 and length 16
  • The binary representation of h and length-1
    (11001 and 01111).

12.4.3 Adding an Entry
  • Since length is a power of 2, the binary
    representation of length will be 100...0 with k
  • Any h is expressible as 2c k r.
  • r is a result of the bit-wise and, since the 2c
    k part is a higher order bit that will be zeroed
    out in the process.

12.4.3 Adding an Entry
12.4.3 Adding an Entry
  • The if statement triggers a rehashing process if
    the size is equal to or greater than the

12.4.4 Rehashing
12.4.4 Rehashing
12.4.5 Searching
12.5 Quadratic Probing Repetition of Probe
  • Quadratic probing only examines N/2 locations of
    the table before starting to repeat locations.
  • Suppose a key is hashed to location h, where
    there is a collision.
  • Following locations are examined.

12.5 Quadratic Probing Repetition of Probe
  • If two different probes (i and j) end up at the
    same location?

12.5 Quadratic Probing Repetition of Probe
  • Since N is a prime number, it must divide one of
    the factors (i j) or (i - j).
  • N divides (i - j) only when at least N probes
    have been made already.
  • N divides (i j) when (i j N), at the very
  • j N - i

12.6 Summary
  • A hash table implements the dictionary operations
    of insert, search, and delete on (key, value)
  • Given a key, a hash function for a given hash
    table computes an index into the table as a
    function of the key by first obtaining a numeric
    hashcode, and then mapping this hashcode to a
    table location.

12.6 Summary
  • When a new key hashes to a location in the hash
    table that is already occupied, it is said to
    collide with the occupying key.
  • Collision resolution is the process used upon
    collision to determine an unoccupied location in
    the hash table where the colliding key may be
  • In searching for a key, the same hash function
    and collision resolution scheme must be used as
    for its insertion.

12.6 Summary
  • A good hash function must be O(1) time and must
    distribute entries uniformly over the hash table.
  • Open addressing relocates a colliding entry in
    the hash table itself. Closed addressing stores
    all entries that hash to a location, in a data
    structure that hangs off that location.
  • Linear probing and quadratic probing are
    instances of open addressing, while chaining is
    an instance of closed addressing.

12.6 Summary
  • Linear probing leads to clustering of entries
    with the clusters becoming increasingly larger as
    more and more collisions occur. Clustering
    degrades performance significantly.
  • Quadratic probing attempts to reduce clustering.
    On the other hand, quadratic probing may leave as
    many as half the hash table empty while reporting
    failure to insert a new entry.

12.6 Summary
  • Chaining is the simplest way to resolve
    collisions and also results in better performance
    than linear probing or quadratic probing.
  • The worst-case search time for linear probing,
    quadratic probing, and chaining is O(n).
  • The load factor of a hash table is the ratio of
    the number of keys, n, to the capacity, N.

12.6 Summary
  • The average performance of chaining depends on
    the load factor. For a perfect hash function that
    always distributes keys uniformly, the average
    search time for chaining is O(1).
Write a Comment
User Comments (0)
About PowerShow.com