Sets and Maps and Hashing presentation

About This Presentation

Transcript and Presenter's Notes

Title: Sets and Maps and Hashing

1
Sets and Maps (and Hashing)

Chapter 9

2
Chapter Objectives

To understand the Java Map and Set interfaces and
how to use them
To learn about hash codes and how they are used
to facilitate efficient search and retrieval
To study two forms of hash tablesopen addressing
and chainingand to understand their relative
benefits and performance tradeoffs

3
Chapter Objectives

To learn how to implement both hash table forms
To be introduced to the implementation of Maps
and Sets
To see how two earlier applications can be more
easily implemented using Map objects for data
storage

4
Review of Sets

Set is unordered, and has no duplicate elements
Suppose A 1,3,5,7,9,11, B 2,3,5,7,11,13
Then
A ? B 1,2,3,5,7,9,11,13
A ? B 3,5,7,11
A ? B 1,9
B ? A 2,13
If C 3,5,9, then C ? A

5
Sets and the Set Interface

The part of the Collection hierarchy that relates
to sets
Includes three interfaces, two abstract classes,
and two actual classes

6
The Set Abstraction

A set is a collection that contains no duplicate
elements
And at most, one null element
In a set, index of an element is meaningless
If s is a set,
s.contains(apple) returns true or false
s.indexOf(apple) makes no sense
s.get(i) is also nonsensical

7
The Set Abstraction

Operations on sets include
Testing for membership
Adding (inserting) elements
Removing elements
Union
Intersection
Difference
Subset

8
The Set Interface and Methods

Has required methods for
Testing set membership
Testing for an empty set
Determining set size
Creating an iterator over the set
Two optional methods for
To add an element
To remove an element
Constructors enforce no duplicate members, and
add method does not allow duplicate item

9
The Set Interface and Methods
10
Comparison of Lists and Sets

Duplicate elements
OK in a list
Not allowed in sets Set.add returns false if you
try to insert a duplicate element
Get method
List has a get method
A set has no get method (index is meaningless)
Iterators
Lists have iterators
Can also iterate thru elements in a set

11
Maps

A map relates one set to another set
Map is a set of ordered pairs (x,y)
Where x key and y value (element)
For example
This map is
(J,Jane), (B,Bill), (B2,Bill), (S,Sam), (B1,Bob)

12
Maps

Map is a set of ordered pairs (x,y)
Where x key and y value (element)
Keys must be unique
But values need not be unique (onto, not 1-to-1)
Each key maps to a particular value (element)
Or, you might say it corresponds to
Maps used for very efficient storage and
retrieval of information in tables
Key is used like index into a list
But key does not need to be integer

13
Maps

Suppose we have the map
(J,Jane), (B,Bill), (B2,Bill), (S,Sam),
(B1,Bob)
And it is stored in aMap
Then
What does aMap.get(B2) return?
Bill
What does aMap.get(Bill) return?
Null, since nothing in aMap has key Bill

14
Map Interface
15
Hash Tables

For maps, want to access entry by its key, not
its value
A hash table is used for such access
For efficiency, want to access element directly
by its key
As opposed to searching for key value in an array
Using a hash table we can retrieve an item in
constant time, on average, and linear time in
worst case
That is, O(1) is expected, but O(n) is worst case

16
Hash Codes and Index Calculation

Hashing idea
Transform an items key value into an integer
Then use this integer as a numeric index

17
Hash Code Index Example

Suppose we want to store number of occurrences of
each Unicode characters in a file
There are 65,536 Unicode characters
What to do?
Could create an array of size 65,536 and store
count of character i in array element i
This will work, but
very inefficient for a small file
Suppose file only has 100 characters!
Is there a better way?

18
Hash Code Index Calculation

Suppose we want to store number of occurrences of
each Unicode characters in a file
There are 65,536 Unicode characters
File of 100 characters
Use a hash code for each character
But how to compute hash code?
Could do the following
Create an array of size 200 and compute index as
index uniChar 200
Good since it uses less space
Bad if there are collisions
2 or more characters in file hash to same value

19
Methods for Generating Hash Codes

Usually, keys consist of strings of letters
and/or digits
The number of possible key values is much larger
than the table size
Generating a good hash code is something of an
art
Some experimentation, trial-and-error may be
required
Desirable properties of a hash function?
A random (uniform) distribution of values
Relatively simple function
Efficient to compute
Collisions can always occur---what to do?

20
Java HashCode Method

For strings, could simply sum int values of all
characters
Will return the same hash code for sign and sing
The Java API algorithm accounts for position of
the characters as follows
The String.hashCode() returns the integer
calculated by the formula s0 x 31(n-1) s1 x
31(n-2) sn-1 where si is the ith character
of the string, and n is the length of the string
Cat will have a hash code of C x 312 a x
31 t
Since 31 is a prime number, fewer collisions

21
Open Addressing

We consider two ways to organize hash tables
Open addressing
Chaining
For open addressing, linear probing can be used
to deal with collisions
If that element contains an item with a different
key, increment the index by one
Keep incrementing until you find the key or null
entry
Null indicates element is not in the table

22
Open Addressing Algorithm
23
Table Wraparound and Search Termination

As index increases, must wrap around (circular
array)
Leads to the potential of an infinite loop
How do you know when to stop searching if the
table is full and you have not found the correct
value?
Stop when the index value for the next probe is
the same as the hash code value for the object,
or
Ensure that the table is never full by increasing
its size after an insertion if its occupancy rate
exceeds a specified threshold (sparser table has
fewer collisions)

24
Open Addressing Example

Suppose we have the following values and hash
codes

25
Open Addressing Example

Suppose we use hashCode 5 to create hash table
Using open addressing

26
Open Addressing Example

Suppose we use hashCode 5 to create hash table
Using open addressing

27
Open Addressing Example

Suppose we use hashCode 5 to create hash table
Using open addressing

28
Open Addressing Example

Suppose we use hashCode 5 to create hash table
Using open addressing

29
Open Addressing Example

Suppose we use hashCode 5 to create hash table
Using open addressing

30
Open Addressing Example

Suppose we use hashCode 5 to create hash table
Using open addressing

31
Open Addressing Example

Suppose we use hashCode 11 to create hash table
Using open addressing

32
Open Addressing Example

Suppose we use hashCode 11 to create hash table
Using open addressing

33
Open Addressing Example

Suppose we use hashCode 11 to create hash table
Using open addressing

34
Open Addressing Example

Suppose we use hashCode 11 to create hash table
Using open addressing

35
Open Addressing Example

Suppose we use hashCode 11 to create hash table
Using open addressing

36
Open Addressing Example

Suppose we use hashCode 11 to create hash table
Using open addressing

37
Hash Table Operations

Iterating thru hash table gives entries in
arbitrary order
Deleting from hash table
Cannot just insert a null --- why not?
Null used for stopping/not found condition
Can insert a dummy value
So, removing does not improve search time
Reducing collisions
Expand size of hash table, and rehash elements
Tradeoff between table size and search efficiency

38
Reducing Collisions by Quadratic Probing

Linear probing tends to form clusters of keys in
the table, causing longer search chains
Quadratic probing can reduce the effect of
clustering
Increments form a quadratic series
Disadvantages?
More work to calculate next index
(multiplication, addition, and modular division)
Not all table elements are examined when looking
for an insertion index

39
Chaining

Chaining is an alternative to open addressing
Each table element references a linked list that
contains all of the items that hash to the same
table index
The linked list is often called a bucket
The approach sometimes called bucket hashing
Only items that have the same value for their
hash codes will be examined when looking for an
object

40
Chaining

Recall hashCode 5
Chaining creates linked list for each collision
In this example
Linked list for Tom, Dick, Sam
Another linked list for Harry and Pete

41
Chaining
42
Chaining

Plusses?
Conceptually simple
Minimizes table size
Good search efficiency
Minuses?
Overhead of linked lists (more storage)
More complex (perhaps)

43
Performance of Hash Tables

Load factor is number of filled cells divided by
table size
Load factor has greatest effect on performance
The lower the load factor, the better the
performance
Why?
Less chance of collision in a sparsely populated
table
But, smaller the load factor, more wasted space

44
Performance of Hash Tables
45
Maps and Hashing

Maps use hash tables!
Hashing converts the key into an index
Index is place where corresponding value stored
Makes it possible to search efficiently
Recall, O(1), on average
Without having an (explicit) index
Of course, there is some additional overhead

46
Implementing a Hash Table
47
Implementing a Hash Table
48
Implementation of Maps and Sets

Class Object implements methods hashCode and
equals, so every class can access these methods
unless it overrides them
Object.equals compares two objects based on their
addresses, not their contents
Object.hashCode calculates an objects hash code
based on its address, not its contents
Java recommends that if you override the equals
method, then you should also override the
hashCode method

49
Implementing HashSetOpen
50
Implementing Java Map and Set Interfaces

The Java API uses a hash table to implement both
the Map and Set interfaces
The task of implementing the two interfaces is
simplified by the inclusion of abstract classes
AbstractMap and AbstractSet in the Collection
hierarchy

51
Nested Interface Map.Entry

One requirement on the key-value pairs for a Map
object is that they implement the interface
Map.EntryltK, Vgt, which is an inner interface of
interface Map
An implementer of the Map interface must contain
an inner class that provides code for the methods
in the table below

52
Additional Applications of Maps

Can implement the phone directory using a map

53
Additional Applications of Maps

Huffman Coding Problem
Use a map for creating an array of elements and
replacing each input character by its bit string
code in the output file
Frequency table
The key will be the input character
The value is the character code string

54
Chapter Review

The Set interface describes an abstract data type
that supports the same operations as a
mathematical set
The Map interface describes an abstract data type
that enables a user to access information
corresponding to a specified key
A hash table uses hashing to transform an items
key into a table index so that insertions,
retrievals, and deletions can be performed in
expected O(1) time
A collision occurs when two keys map to the same
table index
In open addressing, linear probing is often used
to resolve collisions

55
Chapter Review

The best way to avoid collisions is to keep the
table load factor relatively low by rehashing
when the load factor reaches a value such as 0.75
In open addressing, you cant remove an element
from the table when you delete it, but you must
mark it as deleted
A set view of a hash table can be obtained
through method entrySet
Two Java API implementations of the Map (Set)
interface are HashMap (HashSet) and TreeMap
(TreeSet)

Write a Comment

User Comments (0)

About PowerShow.com

Sets and Maps and Hashing PowerPoint PPT Presentation