# CS 3343: Analysis of Algorithms - PowerPoint PPT Presentation

PPT – CS 3343: Analysis of Algorithms PowerPoint presentation | free to download - id: 71d37a-MDc2Y

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## CS 3343: Analysis of Algorithms

Description:

### CS 3343: Analysis of Algorithms Lecture 15: Hash tables – PowerPoint PPT presentation

Number of Views:4
Avg rating:3.0/5.0
Slides: 31
Provided by: Jian149
Category:
Tags:
Transcript and Presenter's Notes

Title: CS 3343: Analysis of Algorithms

1
CS 3343 Analysis of Algorithms
• Lecture 15 Hash tables

2
Hash Tables
• Motivation symbol tables
• A compiler uses a symbol table to relate symbols
to associated data
• Symbols variable names, procedure names, etc.
• Associated data memory location, call graph,
etc.
• For a symbol table (also called a dictionary), we
care about search, insertion, and deletion
• We typically dont care about sorted order

3
Hash Tables
• More formally
• Given a table T and a record x, with key (
symbol) and associated satellite data, we need to
support
• Insert (T, x)
• Delete (T, x)
• Search(T, k)
• We want these to be fast, but dont care about
sorting the records
• The structure we will use is a hash table
• Supports all the above in O(1) expected time!

4
Hashing Keys
• In the following discussions we will consider all
keys to be (possibly large) natural numbers
• When they are not, have to interpret them as
natural numbers.
• How can we convert ASCII strings to natural
numbers for hashing purposes?
• Example Interpret a character string as an
integer expressed in some radix notation. Suppose
the string is CLRS
• ASCII values C67, L76, R82, S83.
• There are 128 basic ASCII values.
• So, CLRS 67128376 1282 821281 831280
141,764,947.

5
• Suppose
• The range of keys is 0..m-1
• Keys are distinct
• The idea
• Set up an array T0..m-1 in which
• Ti x if x? T and keyx i
• Ti NULL otherwise
• This is called a direct-address table
• Operations take O(1) time!
• So whats the problem?

6
• Direct addressing works well when the range m of
keys is relatively small
• But what if the keys are 32-bit integers?
• Problem 1 direct-address table will have 232
entries, more than 4 billion
• Problem 2 even if memory is not an issue, the
time to initialize the elements to NULL may be
• Solution map keys to smaller range 0..m-1
• This mapping is called a hash function

7
Hash Functions
• U Universe of all possible keys.
• Hash function h Mapping from U to the slots of a
hash table T0..m1.
• h U ? 0,1,, m1
• With direct addressing, key k maps to slot Ak.
• With hash tables, key k maps or hashes to slot
Thk.
• hk is the hash value of key k.

8
Hash Functions
T
U gtgt K U gtgt m
0
U (universe of keys)
h(k1)
k1
h(k4)
k4
K (actual keys)
k5
h(k2) h(k5)
collision
k2
h(k3)
k3
m - 1
• Problem collision

9
Resolving Collisions
• How can we solve the problem of collisions?
• Solution 1 chaining

10
• Basic idea (details in Section 11.4)
• To insert if slot is full, try another slot
(following a systematic and consistent strategy),
, until an open slot is found (probing)
• To search, follow same sequence of probes as
would be used when inserting the element
• If reach element with correct key, return it
• If reach a NULL pointer, element is not in table
• Good for fixed sets (adding but no deletion)
• Example file names on a CD-ROM
• Table neednt be much bigger than n

11
Chaining
• Chaining puts elements that hash to the same slot

T

U (universe of keys)
k1
k4

k1

k4
K (actual keys)
k5

k7
k5
k2
k7

k3
k2
k3

k8
k6
k8
k6

12
Chaining
• How to insert an element?

T

U (universe of keys)
k1
k4

k1

k4
K (actual keys)
k5

k7
k5
k2
k7

k3
k2
k3

k8
k6
k8
k6

13
Chaining
• How to delete an element?
• Use a doubly-linked list for efficient deletion

T

U (universe of keys)
k1
k4

k1

k4
K (actual keys)
k5

k7
k5
k2
k7

k3
k2
k3

k8
k6
k8
k6

14
Chaining
• How to search for a element with a given key?

T

U (universe of keys)
k1
k4

k1

k4
K (actual keys)
k5

k7
k5
k2
k7

k3
k2
k3

k8
k6
k8
k6

15
Hashing with Chaining
• Chained-Hash-Insert (T, x)
• Insert x at the head of list Th(keyx).
• Worst-case complexity O(1).
• Chained-Hash-Delete (T, x)
• Delete x from the list Th(keyx).
• Worst-case complexity proportional to length of
list with singly-linked lists. O(1) with
• Chained-Hash-Search (T, k)
• Search an element with key k in list Th(k).
• Worst-case complexity proportional to length of
list.

16
Analysis of Chaining
• Assume simple uniform hashing each key in table
is equally likely to be hashed to any slot
• Given n keys and m slots in the table, the load
factor ? n/m average keys per slot
• What will be the average cost of an unsuccessful
search for a key?
• A ?(1?) (Theorem 11.1)
• What will be the average cost of a successful
search?
• A ?(2 ?/2) ?(1 ?) (Theorem 11.2)

17
Analysis of Chaining Continued
• So the cost of searching O(1 ?)
• If the number of keys n is proportional to the
number of slots in the table, what is ??
• A n O(m) gt ? n/m O(1)
• In other words, we can make the expected cost of
searching constant if we make ? constant

18
Choosing A Hash Function
• Clearly, choosing the hash function well is
crucial
• What will a worst-case hash function do?
• What will be the time to search in this case?
• What are desirable features of the hash function?
• Should distribute keys uniformly into slots
• Should not depend on patterns in the data

19
Hash Functions The Division Method
• h(k) k mod m
• In words hash k into a table with m slots using
the slot given by the remainder of k divided by m
• Example m 31 and k 78, h(k) 16.
• Disadvantage value of m is critical
• Bad if keys bear relation to m
• Or if hash does not depend on all bits of k
• What happens to elements with adjacent values of
k?
• Elements with adjacent keys hashed to different
slots good
• What happens if m is a power of 2 (say 2P)?
• What if m is a power of 10?
• Pick m prime number not too close to power of 2
(or 10)

20
Hash Functions The Multiplication Method
• For a constant A, 0 lt A lt 1
• h(k) ?m (kA mod 1)? ? m (kA - ?kA?) ?

21
Hash Functions The Multiplication Method
• For a constant A, 0 lt A lt 1
• h(k) ?m (kA mod 1)? ? m (kA - ?kA?) ?
• Advantage Value of m is not critical
• Choose m 2P, for easier implementation

Fractional part of kA
22
How to choose A?
• The multiplication method works with any legal
value of A.
• Choose A not too close to 0 or 1
• Knuth Good choice for A (?5 - 1)/2
• Example m 1024, k 123, A ? 0.6180339887
• h(k) ?1024(123 0.6180339887 mod 1)?
• ?1024 0.018169... ? 18.

23
Multiplication Method - Implementation
• Choose m 2p, for some integer p.
• Let the word size of the machine be w bits.
• Assume that k fits into a single word. (k takes w
bits.)
• Let 0 lt s lt 2w. (s takes w bits.)
• Restrict A to be of the form s/2w.
• Let k ? s r1 2w r0 .
• r1 holds the integer part of kA (?kA?) and r0
holds the fractional part of kA (kA mod 1 kA
?kA?).
• We dont care about the integer part of kA.
• So, just use r0, and forget about r1.

24
Multiplication Method Implementation
w bits
k
s A2w
?
binary point

r0
r1
extract p bits
h(k)
• We want ?m (kA mod 1)?.
• m 2p
• We could get that by shifting r0 to the left by p
bits and then taking the p bits that were shifted
to the left of the binary point.
• But, we dont need to shift. Just take the p most
significant bits of r0.

25
Hash Functions Worst Case Scenario
• Scenario
• You are given an assignment to implement hashing
• In a blatant violation of the honor code, your
partner
• Picks a sequence of worst-case keys that all
map to the same slot, causing your implementation
to take O(n) time to search
• Exercise 11.2-5 when U gt nm, for any fixed
hashing function, can always choose n keys to be
hashed into the same slot.

26
Universal Hashing
• When attempting to defeat a malicious adversary,
randomize the algorithm
• Universal hashing pick a hash function randomly
in a way that is independent of the keys that are
actually going to be stored
• pick a hash function randomly when the algorithm
begins (not upon every insert!)
• Guarantees good performance on average, no matter
• Need a family of hash functions to choose from

27
Universal Hashing
• Let ? be a (finite) collection of hash functions
• that map a given universe U of keys
• into the range 0, 1, , m - 1.
• ? is said to be universal if
• for each pair of distinct keys x, y ? U, the
number of hash functions h ? ? for which h(x)
h(y) is at most ?/m
• In other words
• With a random hash function from ?, the chance of
a collision between x and y is at most 1/m (x
? y)

28
Universal Hashing
• Theorem 11.3 (modified from textbook)
• Choose h from a universal family of hash
functions
• Hash n keys into a table of m slots, n ? m
• Then the expected number of collisions involving
a particular key x is less than 1
• Proof
• For each pair of keys y, x, let cyx 1 if y and
x collide, 0 otherwise
• Ecyx lt 1/m (by definition)
• Let Cx be total number of collisions involving
key x
• Since n ? m, we have ECx lt 1
• Implication, expected running time of insertion
is ?(1)

29
A Universal Hash Function
• Choose a prime number p that is larger than all
possible keys
• Choose table size m n
• Randomly choose two integers a, b, such that 1 ?
a ? p -1, and 0 ? b ? p -1
• ha,b(k) ((akb) mod p) mod m
• Example p 17, m 6
• h3,4 (8) ((38 4) 17) 6 11 6 5

30
A universal hash function
• Theorem 11.5 The family of hash functions Hp,m
ha,b defined on the previous slide is universal
• Proof sketch
• For any two distinct keys x, y, for a given ha,b,
• Let r (axb) p, s (ayb) p.
• Can be shown that r?s, and different (a,b)
results in different (r,s)
• x and y collides only when rm sm
• For a given r, the number of values s such that
rm sm and r ? s is at most (p-1)/m
• For a given r, and any randomly chosen s, prob(r
? s rm sm) (p-1) / m / (p-1) 1/m