Sets and Maps and Hashing - PowerPoint PPT Presentation

About This Presentation
Title:

Sets and Maps and Hashing

Description:

Sets and the Set Interface. The part of the Collection hierarchy that ... Not allowed in sets: Set.add returns false if you try to insert a duplicate element ... – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 56
Provided by: phil197
Learn more at: http://www.cs.sjsu.edu
Category:
Tags: hashing | maps | sets

less

Transcript and Presenter's Notes

Title: Sets and Maps and Hashing


1
Sets and Maps (and Hashing)
  • Chapter 9

2
Chapter Objectives
  • To understand the Java Map and Set interfaces and
    how to use them
  • To learn about hash codes and how they are used
    to facilitate efficient search and retrieval
  • To study two forms of hash tablesopen addressing
    and chainingand to understand their relative
    benefits and performance tradeoffs

3
Chapter Objectives
  • To learn how to implement both hash table forms
  • To be introduced to the implementation of Maps
    and Sets
  • To see how two earlier applications can be more
    easily implemented using Map objects for data
    storage

4
Review of Sets
  • Set is unordered, and has no duplicate elements
  • Suppose A 1,3,5,7,9,11, B 2,3,5,7,11,13
  • Then
  • A ? B 1,2,3,5,7,9,11,13
  • A ? B 3,5,7,11
  • A ? B 1,9
  • B ? A 2,13
  • If C 3,5,9, then C ? A

5
Sets and the Set Interface
  • The part of the Collection hierarchy that relates
    to sets
  • Includes three interfaces, two abstract classes,
    and two actual classes

6
The Set Abstraction
  • A set is a collection that contains no duplicate
    elements
  • And at most, one null element
  • In a set, index of an element is meaningless
  • If s is a set,
  • s.contains(apple) returns true or false
  • s.indexOf(apple) makes no sense
  • s.get(i) is also nonsensical

7
The Set Abstraction
  • Operations on sets include
  • Testing for membership
  • Adding (inserting) elements
  • Removing elements
  • Union
  • Intersection
  • Difference
  • Subset

8
The Set Interface and Methods
  • Has required methods for
  • Testing set membership
  • Testing for an empty set
  • Determining set size
  • Creating an iterator over the set
  • Two optional methods for
  • To add an element
  • To remove an element
  • Constructors enforce no duplicate members, and
  • add method does not allow duplicate item

9
The Set Interface and Methods
10
Comparison of Lists and Sets
  • Duplicate elements
  • OK in a list
  • Not allowed in sets Set.add returns false if you
    try to insert a duplicate element
  • Get method
  • List has a get method
  • A set has no get method (index is meaningless)
  • Iterators
  • Lists have iterators
  • Can also iterate thru elements in a set

11
Maps
  • A map relates one set to another set
  • Map is a set of ordered pairs (x,y)
  • Where x key and y value (element)
  • For example
  • This map is
  • (J,Jane), (B,Bill), (B2,Bill), (S,Sam), (B1,Bob)

12
Maps
  • Map is a set of ordered pairs (x,y)
  • Where x key and y value (element)
  • Keys must be unique
  • But values need not be unique (onto, not 1-to-1)
  • Each key maps to a particular value (element)
  • Or, you might say it corresponds to
  • Maps used for very efficient storage and
    retrieval of information in tables
  • Key is used like index into a list
  • But key does not need to be integer

13
Maps
  • Suppose we have the map
  • (J,Jane), (B,Bill), (B2,Bill), (S,Sam),
    (B1,Bob)
  • And it is stored in aMap
  • Then
  • What does aMap.get(B2) return?
  • Bill
  • What does aMap.get(Bill) return?
  • Null, since nothing in aMap has key Bill

14
Map Interface
15
Hash Tables
  • For maps, want to access entry by its key, not
    its value
  • A hash table is used for such access
  • For efficiency, want to access element directly
    by its key
  • As opposed to searching for key value in an array
  • Using a hash table we can retrieve an item in
    constant time, on average, and linear time in
    worst case
  • That is, O(1) is expected, but O(n) is worst case

16
Hash Codes and Index Calculation
  • Hashing idea
  • Transform an items key value into an integer
  • Then use this integer as a numeric index

17
Hash Code Index Example
  • Suppose we want to store number of occurrences of
    each Unicode characters in a file
  • There are 65,536 Unicode characters
  • What to do?
  • Could create an array of size 65,536 and store
    count of character i in array element i
  • This will work, but
  • very inefficient for a small file
  • Suppose file only has 100 characters!
  • Is there a better way?

18
Hash Code Index Calculation
  • Suppose we want to store number of occurrences of
    each Unicode characters in a file
  • There are 65,536 Unicode characters
  • File of 100 characters
  • Use a hash code for each character
  • But how to compute hash code?
  • Could do the following
  • Create an array of size 200 and compute index as
    index uniChar 200
  • Good since it uses less space
  • Bad if there are collisions
  • 2 or more characters in file hash to same value

19
Methods for Generating Hash Codes
  • Usually, keys consist of strings of letters
    and/or digits
  • The number of possible key values is much larger
    than the table size
  • Generating a good hash code is something of an
    art
  • Some experimentation, trial-and-error may be
    required
  • Desirable properties of a hash function?
  • A random (uniform) distribution of values
  • Relatively simple function
  • Efficient to compute
  • Collisions can always occur---what to do?

20
Java HashCode Method
  • For strings, could simply sum int values of all
    characters
  • Will return the same hash code for sign and sing
  • The Java API algorithm accounts for position of
    the characters as follows
  • The String.hashCode() returns the integer
    calculated by the formula s0 x 31(n-1) s1 x
    31(n-2) sn-1 where si is the ith character
    of the string, and n is the length of the string
  • Cat will have a hash code of C x 312 a x
    31 t
  • Since 31 is a prime number, fewer collisions

21
Open Addressing
  • We consider two ways to organize hash tables
  • Open addressing
  • Chaining
  • For open addressing, linear probing can be used
    to deal with collisions
  • If that element contains an item with a different
    key, increment the index by one
  • Keep incrementing until you find the key or null
    entry
  • Null indicates element is not in the table

22
Open Addressing Algorithm
23
Table Wraparound and Search Termination
  • As index increases, must wrap around (circular
    array)
  • Leads to the potential of an infinite loop
  • How do you know when to stop searching if the
    table is full and you have not found the correct
    value?
  • Stop when the index value for the next probe is
    the same as the hash code value for the object,
    or
  • Ensure that the table is never full by increasing
    its size after an insertion if its occupancy rate
    exceeds a specified threshold (sparser table has
    fewer collisions)

24
Open Addressing Example
  • Suppose we have the following values and hash
    codes

25
Open Addressing Example
  • Suppose we use hashCode 5 to create hash table
  • Using open addressing

26
Open Addressing Example
  • Suppose we use hashCode 5 to create hash table
  • Using open addressing

27
Open Addressing Example
  • Suppose we use hashCode 5 to create hash table
  • Using open addressing

28
Open Addressing Example
  • Suppose we use hashCode 5 to create hash table
  • Using open addressing

29
Open Addressing Example
  • Suppose we use hashCode 5 to create hash table
  • Using open addressing

30
Open Addressing Example
  • Suppose we use hashCode 5 to create hash table
  • Using open addressing

31
Open Addressing Example
  • Suppose we use hashCode 11 to create hash table
  • Using open addressing

32
Open Addressing Example
  • Suppose we use hashCode 11 to create hash table
  • Using open addressing

33
Open Addressing Example
  • Suppose we use hashCode 11 to create hash table
  • Using open addressing

34
Open Addressing Example
  • Suppose we use hashCode 11 to create hash table
  • Using open addressing

35
Open Addressing Example
  • Suppose we use hashCode 11 to create hash table
  • Using open addressing

36
Open Addressing Example
  • Suppose we use hashCode 11 to create hash table
  • Using open addressing

37
Hash Table Operations
  • Iterating thru hash table gives entries in
    arbitrary order
  • Deleting from hash table
  • Cannot just insert a null --- why not?
  • Null used for stopping/not found condition
  • Can insert a dummy value
  • So, removing does not improve search time
  • Reducing collisions
  • Expand size of hash table, and rehash elements
  • Tradeoff between table size and search efficiency

38
Reducing Collisions by Quadratic Probing
  • Linear probing tends to form clusters of keys in
    the table, causing longer search chains
  • Quadratic probing can reduce the effect of
    clustering
  • Increments form a quadratic series
  • Disadvantages?
  • More work to calculate next index
    (multiplication, addition, and modular division)
  • Not all table elements are examined when looking
    for an insertion index

39
Chaining
  • Chaining is an alternative to open addressing
  • Each table element references a linked list that
    contains all of the items that hash to the same
    table index
  • The linked list is often called a bucket
  • The approach sometimes called bucket hashing
  • Only items that have the same value for their
    hash codes will be examined when looking for an
    object

40
Chaining
  • Recall hashCode 5
  • Chaining creates linked list for each collision
  • In this example
  • Linked list for Tom, Dick, Sam
  • Another linked list for Harry and Pete

41
Chaining
42
Chaining
  • Plusses?
  • Conceptually simple
  • Minimizes table size
  • Good search efficiency
  • Minuses?
  • Overhead of linked lists (more storage)
  • More complex (perhaps)

43
Performance of Hash Tables
  • Load factor is number of filled cells divided by
    table size
  • Load factor has greatest effect on performance
  • The lower the load factor, the better the
    performance
  • Why?
  • Less chance of collision in a sparsely populated
    table
  • But, smaller the load factor, more wasted space

44
Performance of Hash Tables
45
Maps and Hashing
  • Maps use hash tables!
  • Hashing converts the key into an index
  • Index is place where corresponding value stored
  • Makes it possible to search efficiently
  • Recall, O(1), on average
  • Without having an (explicit) index
  • Of course, there is some additional overhead

46
Implementing a Hash Table
47
Implementing a Hash Table
48
Implementation of Maps and Sets
  • Class Object implements methods hashCode and
    equals, so every class can access these methods
    unless it overrides them
  • Object.equals compares two objects based on their
    addresses, not their contents
  • Object.hashCode calculates an objects hash code
    based on its address, not its contents
  • Java recommends that if you override the equals
    method, then you should also override the
    hashCode method

49
Implementing HashSetOpen
50
Implementing Java Map and Set Interfaces
  • The Java API uses a hash table to implement both
    the Map and Set interfaces
  • The task of implementing the two interfaces is
    simplified by the inclusion of abstract classes
    AbstractMap and AbstractSet in the Collection
    hierarchy

51
Nested Interface Map.Entry
  • One requirement on the key-value pairs for a Map
    object is that they implement the interface
    Map.EntryltK, Vgt, which is an inner interface of
    interface Map
  • An implementer of the Map interface must contain
    an inner class that provides code for the methods
    in the table below

52
Additional Applications of Maps
  • Can implement the phone directory using a map

53
Additional Applications of Maps
  • Huffman Coding Problem
  • Use a map for creating an array of elements and
    replacing each input character by its bit string
    code in the output file
  • Frequency table
  • The key will be the input character
  • The value is the character code string

54
Chapter Review
  • The Set interface describes an abstract data type
    that supports the same operations as a
    mathematical set
  • The Map interface describes an abstract data type
    that enables a user to access information
    corresponding to a specified key
  • A hash table uses hashing to transform an items
    key into a table index so that insertions,
    retrievals, and deletions can be performed in
    expected O(1) time
  • A collision occurs when two keys map to the same
    table index
  • In open addressing, linear probing is often used
    to resolve collisions

55
Chapter Review
  • The best way to avoid collisions is to keep the
    table load factor relatively low by rehashing
    when the load factor reaches a value such as 0.75
  • In open addressing, you cant remove an element
    from the table when you delete it, but you must
    mark it as deleted
  • A set view of a hash table can be obtained
    through method entrySet
  • Two Java API implementations of the Map (Set)
    interface are HashMap (HashSet) and TreeMap
    (TreeSet)
Write a Comment
User Comments (0)
About PowerShow.com