Hash Table - PowerPoint PPT Presentation

About This Presentation
Title:

Hash Table

Description:

An array in which items are not stored consecutively - their place of storage is ... Hashed key: the result of applying ... Each table position is a linked list ... – PowerPoint PPT presentation

Number of Views:112
Avg rating:3.0/5.0
Slides: 13
Provided by: sfu5
Category:
Tags: hash | linkedlist | table

less

Transcript and Presenter's Notes

Title: Hash Table


1
IAT 800
  • Hash Table
  • Bucket Sort

2
Hash Table
  • An array in which items are not stored
    consecutively - their place of storage is
    calculated using the key and a hash function
  • Hashed key the result of applying a hash
    function to a key
  • Keys and entries are scattered throughout the
    array

key
entry
4
hash function
array index
Key
10
123
3
Hashing
  • insert calculate place of storage, insert
    TableNode O(1)
  • find calculate place of storage, retrieve entry
    O(1)
  • remove calculate place of storage, set it to
    null O(1)

key
entry
4
10
123
4
Hashing example
  • 10 stock details, 10 table positions

key
entry
  • Stock numbers between 0 and 1000

85 85, apples
0
  • Use hash function stock no. / 100
  • What if we now insert stock no. 350?
  • Position 3 is occupied there is a collision

323 323, guava
462 462, pears
  • Collision resolution strategy insert in the next
    free position (linear probing)

350 350, oranges
  • Given a stock number, we find stock by using the
    hash function again, and use the collision
    resolution strategy if necessary

912 912, papaya
5
Hashing performance
  • The hash function
  • Ideally, it should distribute keys and entries
    evenly throughout the table
  • It should minimize collisions, where the position
    given by the hash function is already occupied
  • The collision resolution strategy
  • Separate chaining chain together several
    keys/entries in each position
  • Open addressing store the key/entry in a
    different position
  • The size of the table
  • Too big will waste memory too small will
    increase collisions and may eventually force
    rehashing (copying into a larger table)
  • Should be appropriate for the hash function used
    and a prime number is best

6
Hash function
  • Truncation
  • Ignore part of the key and use the rest as the
    array index (converting non-numeric parts)
  • A fast technique, but check for an even
    distribution throughout the table
  • Folding
  • Partition the key into several parts and then
    combine them in any convenient way
  • Unlike truncation, uses information from the
    whole key
  • Modular arithmetic (used by truncation folding,
    and on its own)
  • To keep the calculated table position within the
    table, divide the position by the size of the
    table, and take the remainder as the new position

7
Hash Function Examples
  • Truncation If students have an 9-digit
    identification number, take the last 3 digits as
    the table position
  • e.g. 925371622 becomes 622
  • Folding Split a 9-digit number into three
    3-digit numbers, and add them
  • e.g. 925371622 becomes 925 376 622 1923
  • Modular arithmetic If the table size is 1000,
    the first example always keeps within the table
    range, but the second example does not (it should
    be mod 1000)
  • e.g. 1923 mod 1000 923 (in Java 1923
    1000)

8
Choosing the table size to minimize collisions
  • As the number of elements in the table increases,
    the likelihood of a collision increases - so make
    the table as large as practical
  • If the table size is 100, and all the hashed keys
    are divisible by 10, there will be many
    collisions!
  • Particularly bad if table size is a power of a
    small integer such as 2 or 10
  • More generally, collisions may be more frequent
    if
  • greatest common divisor (hashed keys, table size)
    gt 1
  • Therefore, make the table size a prime number
    (gcd 1)

Collisions may still happen, so we need a
collision resolution strategy
9
Collision resolution chaining
  • Each table position is a linked list
  • Add the keys and entries anywhere in the list
    (front easiest)
  • Advantages over open addressing
  • Simpler insertion and removal
  • Array size is not a limitation (but should still
    minimize collisions make table size roughly
    equal to expected number of keys and entries)
  • Disadvantage
  • Memory overhead is large if entries are small

No need to change position!
4
10
123
10
Applications of Hashing
  • Compilers use hash tables to keep track of
    declared variables
  • A hash table can be used for on-line spelling
    checkers if misspelling detection (rather than
    correction) is important, an entire dictionary
    can be hashed and words checked in constant time
  • Hash functions can be used to quickly check for
    inequality if two elements hash to different
    values they must be different
  • Storing sparse data

11
When not to use hashing?
  • Hash tables are very good if there is a need for
    many searches in a reasonably stable table
  • Hash tables are not so good if there are many
    insertions and deletions, or if table traversals
    are needed
  • If there are more data than available memory then
    use a tree
  • Also, hashing is very slow for any operations
    which require the entries to be sorted
  • e.g. Find the minimum key

12
Bucket Sort
  • For Each item to be sorted, compute
  • entryIndex key / tableSize
  • Chain entries on collision
  • Result Each table entry has all the entries in a
    range of key values
  • For some problems, this is enough
  • Collision Detection

4
10
123
Write a Comment
User Comments (0)
About PowerShow.com