Hash Tables PowerPoint PPT Presentation

presentation player overlay
1 / 10
About This Presentation
Transcript and Presenter's Notes

Title: Hash Tables


1
Hash Tables
2
Goal Faster Manipulation of Container Class
Elements
  • List
  • Search O(N) Insert O(N) Remove O(N)
  • Binary Search Tree
  • Search O(log2N) Insert O(log2N) Remove
    O(log2N)
  • Can we do better? O(1)?
  • Yes we can!

3
Approach
  • Lists and Trees are slowed down by comparing
    element to find/insert/remove to other elements
    already in the structure
  • What if we just find/insert/remove a value in a
    structure by computing its position based only on
    the value itself?
  • Java HashMap, HashSet, Hashtable classes

4
Hash Function
  • A function H(val) that generates an index for
    storage (insertion)
  • Note can use same function for retrieving,
    deleting
  • Java hashCode()
  • Example
  • Given an array of 10 values, want to store
    numbers
  • A possible hash function H(x) x 10
  • 37 goes in slot 7
  • 20 goes in slot 0
  • 58 goes in slot 8
  • Problem what if we want to store 27? Such values
    that generate same result cause a collision (more
    later)

5
Qualities of a Good Hash Function
  • A good hash function should
  • Be easy to compute
  • Randomly distribute values throughout the table
  • E.g.
  • For integer id numbers 100000, 100001, 100002,
    etc., (first last digits) table_size would
    not be good
  • For words, sum of first 3 letters would not be
    good (some combinations, such as STR, are much
    more common than others, such as XYZ)

6
Possible Hash Functions
  • Integers
  • Value table_size
  • Value constant table_size
  • String
  • (Sum of ASCII values) table_size
  • Problem ART RAT TAR
  • (Weighted sum of ASCII values) table_size
  • E.g., multiply each ASCII value by its position
    before summing
  • Can also use various numbers of bit shifts on
    each value for very efficient weighting

7
Collision Resolution
  • Methods for resolving collisions
  • Open Hashing values can share slot
  • Chaining make each location a list (or tree)
    instead of a value holder
  • Closed Hashing each value has own slot
  • Linear probing look for next open slot
  • Quadratic probing compute next slot to try
    based on some quadratic function (e.g. ,original
    position tried plus (attempt 2 ))
  • e.g., try slothash_val, slothash_val1,
    slothash_val4, etc.
  • Double hashing generate a second hash function
    that computes the interval from the original
    position to check repeatedly jump by this
    second amount until find an open slot
  • e.g., second hash function for numbers is sum of
    digits if collision when inserting 42 at slot 6,
    next try slot 12, then slot 18, etc.
  • NOTE treat storage array as circular with these
    three collision resolution mechanisms
  • e.g., linear probing if last slot is filled, try
    first slot and continue

8
Hash Table Size
  • What if using double hashing with table size of
    10, double hash function generates 5?
  • Only slots youll try are original 10 and
    (original 5) 10
  • Table could appear full even if only two slots
    are filled
  • Solution
  • Table size should always be prime!
  • (or at least relatively prime)

9
Load Factor
  • The load factor for a table is percentage of
    slots used in the table
  • Performance degrades for the various collision
    resolution methods at different load factors
  • Open Hashing 100
  • Linear/Quadratic Probing - 50
  • Double Hashing close to 100

10
Lazy Deletion
  • If deleting a value in a collision chain, its
    inefficient to shift everything back
  • Solution keep an extra field in each location
    in addition to the value
  • Three possible values
  • Empty (stop if searching/inserting)
  • Present (keep looking if find this when
    searching)
  • Deleted (keep looking if see this when searching,
    but stop and use this location if inserting)
  • Now can change field from Present to Deleted
    when removing a value
Write a Comment
User Comments (0)
About PowerShow.com