Hash Tables - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Hash Tables

Description:

It is built to enable fast searching. What LnkList Tree HashTable. Store Light Less light Medium ... 323 323, guava. 350 350, oranges ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 34
Provided by: sfu5
Category:
Tags: guava | hash | tables

less

Transcript and Presenter's Notes

Title: Hash Tables


1
IAT 355
  • Hash Tables
  • Binary Search
  • Sorting

2
Data Structures
  • With a collection of data, we often want to do
    many things
  • Organize
  • Iterate
  • Add new
  • Delete old
  • Search

3
Data Structures
  • It is built to enable fast searching
  • What LnkList Tree HashTable
  • Store Light Less light Medium
  • Iterate simple complex extra work
  • Add O(1) O( lgN ) O(1)
  • Delete O(1) O( lgN ) O(1)
  • Search O(n) O(lgN) O(1)

4
Hash Table
  • An array in which items are not stored
    consecutively - their place of storage is
    calculated using the key and a hash function
  • Hashed key the result of applying a hash
    function to a key
  • Keys and entries are scattered throughout the
    array

key
entry
4
hash function
array index
Key
10
123
5
Hashing
  • insert compute location, insert TableNode O(1)
  • find compute location, retrieve entry O(1)
  • remove compute location, set it to null O(1)

key
entry
4
10
123
6
Hashing example
  • 10 stock details, 10 table positions

key
entry
  • Stock numbers between 0 and 1000

85 85, apples
0
  • Use hash function stock no. / 100
  • What if we now insert stock no. 350?
  • Position 3 is occupied there is a collision

323 323, guava
462 462, pears
  • Collision resolution strategy insert in the next
    free position (linear probing)

350 350, oranges
  • Given a stock number, we find stock by using the
    hash function again, and use the collision
    resolution strategy if necessary

912 912, papaya
7
Hashing performance
  • The hash function
  • Ideally, it should distribute keys and entries
    evenly throughout the table
  • It should minimize collisions, where the position
    given by the hash function is already occupied
  • The collision resolution strategy
  • Separate chaining chain together several
    keys/entries in each position
  • Open addressing store the key/entry in a
    different position
  • The size of the table
  • Too big will waste memory too small will
    increase collisions and may eventually force
    rehashing (copying into a larger table)
  • Should be appropriate for the hash function used
    and a prime number is best

8
Hash function
  • Truncation
  • Ignore part of the key and use the rest as the
    array index (converting non-numeric parts)
  • A fast technique, but check for an even
    distribution throughout the table
  • Folding
  • Partition the key into several parts and then
    combine them in any convenient way
  • Unlike truncation, uses information from the
    whole key
  • Modular arithmetic (used by truncation folding,
    and on its own)
  • To keep the calculated table position within the
    table, divide the position by the size of the
    table, and take the remainder as the new position

9
Hash Function Examples
  • Truncation If students have an 9-digit
    identification number, take the last 3 digits as
    the table position
  • e.g. 925371622 becomes 622
  • Folding Split a 9-digit number into three
    3-digit numbers, and add them
  • e.g. 925371622 becomes 925 376 622 1923
  • Modular arithmetic If the table size is 1000,
    the first example always keeps within the table
    range, but the second example does not (it should
    be mod 1000)
  • e.g. 1923 mod 1000 923 (in Java 1923
    1000)

10
Choosing the table size to minimize collisions
  • As the number of elements in the table increases,
    the likelihood of a collision increases - so make
    the table as large as practical
  • If the table size is 100, and all the hashed keys
    are divisible by 10, there will be many
    collisions!
  • Particularly bad if table size is a power of a
    small integer such as 2 or 10
  • More generally, collisions may be more frequent
    if
  • greatest common divisor (hashed keys, table size)
    gt 1
  • Therefore, make the table size a prime number
    (gcd 1)

Collisions may still happen, so we need a
collision resolution strategy
11
Collision resolution chaining
  • Each table position is a linked list
  • Add the keys and entries anywhere in the list
    (front easiest)
  • Advantages over open addressing
  • Simpler insertion and removal
  • Array size is not a limitation (but should still
    minimize collisions make table size roughly
    equal to expected number of keys and entries)
  • Disadvantage
  • Memory overhead is large if entries are small

No need to change position!
4
10
123
12
Applications of Hashing
  • Compilers use hash tables to keep track of
    declared variables
  • A hash table can be used for on-line spelling
    checkers if misspelling detection (rather than
    correction) is important, an entire dictionary
    can be hashed and words checked in constant time
  • Hash functions can be used to quickly check for
    inequality if two elements hash to different
    values they must be different
  • Storing sparse data

13
When to use hashing?
  • Good if
  • Need many searches in a reasonably stable table
  • Not So Good if
  • Many insertions and deletions,
  • If table traversals are needed
  • Need things in sorted order
  • More data than available memory
  • Use a tree and store leaves on disk

14
Java
  • class HashMap
  • Provides hash table functionality in Java
  • More overhead, but free implementation
  • Be careful to parameterize it carefully

15
Bucket Sort
  • For Each item to be sorted, compute
  • entryIndex key / tableSize
  • Chain entries on collision
  • Result Each table entry has all the entries in a
    range of key values
  • For some problems, this is enough
  • Collision Detection

4
10
123
16
Bucket Sort
  • Frequently used in graphics interactive apps
  • Eg. One bucket per pixel row
  • Eg. One bucket for 64x64 pixel region
  • Put all data into buckets so that selection
    (search) can rapidly locate good candidates

17
Search
  • Frequently wish to organize data to support
    search
  • Eg. Search for single item
  • Eg. Search for all items between 3 and 7

18
Search
  • Often want to search for an item in a list
  • In an unsorted list, must search linearly
  • In a sorted list

19
Binary Search
  • Start with index pointer at start and end
  • Compute index between two end pointers

20
Binary Search
  • Compare middle item to search item
  • If search lt mid move end to mid -1

21
Binary Search
  • int Arr new int8
  • ltpopulate arraygt
  • int search 4
  • int start 0, end Arr.length, mid
  • mid (start end)/2
  • while( start ltend )
  • if(search Arrmid )
  • SUCCESS
  • if( search lt Arrmid )
  • end mid 1
  • else
  • start mid 1

22
Binary Search
  • Run Time
  • O( log(N) )
  • Every iteration chops list in half

23
Sorting
  • Need a sorted list to do binary search
  • Numerous sort algorithms

24
The family of sorting methods
Main sorting themes
Address- -based sorting
Comparison-based sorting
Proxmap Sort
RadixSort
Transposition sorting
BubbleSort
Diminishing increment sorting
Insert and keep sorted
Divide and conquer
Priority queue sorting
ShellSort
Selection sort
QuickSort
MergeSort
Insertion sort
Tree sort
Heap sort
25
Bubble sort transposition sorting
  • Not a fast sort!
  • Code is small

for (int iarr.length igt0 i--) for (int
j1 jlti j) if (arrj-1 gt arrj)
temp arrj-1
arrj-1 arrj arrj temp

26
Divide and conquer sorting
MergeSort
QuickSort
27
QuickSort divide and conquer sorting
  • As its name implies, QuickSort is the fastest
    known sorting algorithm in practice
  • Its average running time is O(n log n)
  • The idea is as follows
  • 1. If the number of elements to be sorted is 0 or
    1, then return
  • 2. Pick any element, v (this is called the pivot)
  • 3. Partition the other elements into two disjoint
    sets, S1 of elements ? v, and S2 of elements gt v
  • 4. Return QuickSort (S1) followed by v followed
    by QuickSort (S2)

28
QuickSort example
5
1
4
2
10
3
9
15
12
Pick the middle element as the pivot, i.e., 10
29
Partitioning example
5
11
4
25
10
3
9
15
12
Pick the middle element as the pivot, i.e., 10
30
10
4
5
25
11
3
9
15
12
9
4
5
3
10
25
11
15
12
31
Pseudocode for Quicksort
  • procedure quicksort(array, left, right)
  • if right gt left
  • select a pivot index (e.g. pivotIdx left)
  • pivotIdxNew partition(array, left, right,
    pivotIdx)
  • quicksort(array, left, pivotIdxNew - 1)
  • quicksort(array, pivotIdxNew 1, right)

32
Pseudo code for partitioning
pivotIdx middle of array aswap apivotIdx
with afirst // Move the pivot out of the
way swapPos first 1 for( i swapPos 1 i
lt last i ) if (ai lt afirst)
swap aswapPos with ai swapPos
// Now move the pivot back to its rightful
place swap afirst with aswapPos-1 return
swapPos-1 // Pivot position
33
Java
  • Sort and binary search provided on Arrays
  • sort() ints, floats
  • sort( Object a, Comparator c )
  • you supply the Comparator object, which
    Contains a function to compare 2 objects
  • binarySearch()
  • ints, floats.
  • Search Objects with Comparator object
Write a Comment
User Comments (0)
About PowerShow.com