Chapter 12

- Hash Table

Hash Table

- So far, the best worst-case time for searching is

O(log n). - Hash tables
- average search time of O(1).
- worst case search time of O(n).

Learning Objectives

- Develop the motivation for hashing.
- Study hash functions.
- Understand collision resolution and compare and

contrast various collision resolution schemes. - Summarize the average running times for hashing

under various collision resolution schemes. - Explore the java.util.HashMap class.

12.1 Motivation

- Let's design a data structure using an array for

which the indices could be the keys of entries. - Suppose we wanted to store the keys 1, 3, 5, 8,

10, with a guaranteed one-step access to any of

these.

12.1 Motivation

- The space consumption does not depend on the

actual number of entries stored. - It depends on the range of keys.
- What if we wanted to store strings?
- For each string, we would first have to compute a

numeric key that is equivalent to it. - java.lang.String.hashCode() computes the numeric

equivalent (or hashcode) of a string by an

arithmetic manipulation involving its individual

characters.

12.1 Motivation

- Using numeric keys directly as indices is out of

the question for most applications. - There isn't enough space

12.1 Motivation

12.2 Hashing

- A simple hash function
- table size of 10
- h(k) k mod 10

12.2 Hashing

- ear collides with cat at position 4.
- There is empty space in the table, and it is up

to the collision resolution scheme to find an

appropriate position for this string. - A better mapping function
- For any hash function one could devise, there are

always hashcodes that could force the mapping

function to be ineffective by generating lots of

collisions.

12.2 Hashing

12.3 Collision Resolution

- There are two ways to resolve collisions.
- open addressing
- Find another location for the colliding key

within the hash table. - closed addressing
- store all keys that hash to the same location in

a data structure that hangs off that location.

12.3.1 Linear Probing

12.3.1 Linear Probing

- As more and more entries are hashed into the

table, they tend to form clusters that get bigger

and bigger. - The number of probes on collisions gradually

increases, thus slowing down the hash time to a

crawl.

12.3.1 Linear Probing

- Insert "cat", "ear", "sad", and "aid"

12.3.1 Linear Probing

- Clustering is the downfall of linear probing, so

we need to look to another method of collision

resolution that avoids clustering.

12.3.2 Quadratic Probing

12.3.2 Quadratic Probing

- Avoids Clustering
- When the probing stops with a failure to find an

empty spot, as many as half the locations of the

table may still be unoccupied. - A hash to 2,3,6,0,7, and 5 are endlessly

repeated, and an insertion is not done, even

though half the table is empty.

12.3.2 Quadratic Probing

- For any given prime N, once a location is

examined twice, all locations that are examined

thereafter are also ones that have been already

examined.

12.3.3 Chaining

- If a collision occurs at location i of the hash

table, it simply adds the colliding entry to a

linked list that is built at that location.

Running times

- We assume that the hashing process itself

(hashcode and mapping) takes O(1). - Running time of insertion is determined by the

collision resolution scheme.

12.4 The java.util.HashMap Class

- Consider a university-wide database that stores

student records. - Every student is assigned a unique id (key), with

which is associated several pieces of information

such as name, address, credits, gpa, etc. - These pieces of information constitute the value.

12.4 The java.util.HashMap Class

- A StudentInfo dictionary that stores (id, info)

pairs for all the students enrolled in the

university. - The operations corresponding to this relationship

can be found in hava.util.MapltK,Vgt

12.4 The java.util.HashMap Class

- The Map interface also provides operations to

enumerate all the keys, enumerate all the values,

get the size of the dictionary, check whether the

dictionary is empty, and so on. - The java.util.HashMap implements the dictionary

abstraction as specified by the java.util.Map

interface. It resolves collisions using chaining.

12.4.1 Table and Load Factor

- When the no-arg constructor is used
- Default initial capacity 16
- Default load factor of 0.75.
- The table size is defined as the actual number of

key-value mappings in the has table.

12.4.1 Table and Load Factor

- We can choose an initial capacity
- Only uses capacities that are powers of 2.
- 101 becomes 128

12.4.1 Table and Load Factor

- An initial capacity of 128.

12.4.2 Storage of Entries

- Relevant fields in the HashMap class.
- threshold is the size threshold
- Product of the capacity and the threshold load

factor (N t)

12.4.2 Storage of Entries

- Entry table sets up an array of chains.
- Map.EntryltK,Vgt is defined inside the MapltK,Vgt

interface. - next holds a reference to the next Entry in its

linked list.

12.4.3 Adding an Entry

- Example
- Name serves as a key to the phone number value.

12.4.3 Adding an Entry

12.4.3 Adding an Entry

- If the key argument is null, a special object,

NULL_KEY is returned, otherwise the argument key

is returned as is.

12.4.3 Adding an Entry

12.4.3 Adding an Entry

- Example
- h 25 and length 16
- The binary representation of h and length-1

(11001 and 01111).

12.4.3 Adding an Entry

- Since length is a power of 2, the binary

representation of length will be 100...0 with k

zeros. - Any h is expressible as 2c k r.
- r is a result of the bit-wise and, since the 2c

k part is a higher order bit that will be zeroed

out in the process.

12.4.3 Adding an Entry

12.4.3 Adding an Entry

- The if statement triggers a rehashing process if

the size is equal to or greater than the

threshold.

12.4.4 Rehashing

12.4.4 Rehashing

12.4.5 Searching

12.5 Quadratic Probing Repetition of Probe

Locations

- Quadratic probing only examines N/2 locations of

the table before starting to repeat locations. - Suppose a key is hashed to location h, where

there is a collision. - Following locations are examined.

12.5 Quadratic Probing Repetition of Probe

Locations

- If two different probes (i and j) end up at the

same location?

12.5 Quadratic Probing Repetition of Probe

Locations

- Since N is a prime number, it must divide one of

the factors (i j) or (i - j). - N divides (i - j) only when at least N probes

have been made already. - N divides (i j) when (i j N), at the very

least. - j N - i

12.6 Summary

- A hash table implements the dictionary operations

of insert, search, and delete on (key, value)

pairs. - Given a key, a hash function for a given hash

table computes an index into the table as a

function of the key by first obtaining a numeric

hashcode, and then mapping this hashcode to a

table location.

12.6 Summary

- When a new key hashes to a location in the hash

table that is already occupied, it is said to

collide with the occupying key. - Collision resolution is the process used upon

collision to determine an unoccupied location in

the hash table where the colliding key may be

inserted. - In searching for a key, the same hash function

and collision resolution scheme must be used as

for its insertion.

12.6 Summary

- A good hash function must be O(1) time and must

distribute entries uniformly over the hash table. - Open addressing relocates a colliding entry in

the hash table itself. Closed addressing stores

all entries that hash to a location, in a data

structure that hangs off that location. - Linear probing and quadratic probing are

instances of open addressing, while chaining is

an instance of closed addressing.

12.6 Summary

- Linear probing leads to clustering of entries

with the clusters becoming increasingly larger as

more and more collisions occur. Clustering

degrades performance significantly. - Quadratic probing attempts to reduce clustering.

On the other hand, quadratic probing may leave as

many as half the hash table empty while reporting

failure to insert a new entry.

12.6 Summary

- Chaining is the simplest way to resolve

collisions and also results in better performance

than linear probing or quadratic probing. - The worst-case search time for linear probing,

quadratic probing, and chaining is O(n). - The load factor of a hash table is the ratio of

the number of keys, n, to the capacity, N.

12.6 Summary

- The average performance of chaining depends on

the load factor. For a perfect hash function that

always distributes keys uniformly, the average

search time for chaining is O(1).