CS 3343: Analysis of Algorithms - PowerPoint PPT Presentation

Loading...

PPT – CS 3343: Analysis of Algorithms PowerPoint presentation | free to download - id: 71d37a-MDc2Y



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

CS 3343: Analysis of Algorithms

Description:

CS 3343: Analysis of Algorithms Lecture 15: Hash tables – PowerPoint PPT presentation

Number of Views:4
Avg rating:3.0/5.0
Slides: 31
Provided by: Jian149
Learn more at: http://cs.utsa.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: CS 3343: Analysis of Algorithms


1
CS 3343 Analysis of Algorithms
  • Lecture 15 Hash tables

2
Hash Tables
  • Motivation symbol tables
  • A compiler uses a symbol table to relate symbols
    to associated data
  • Symbols variable names, procedure names, etc.
  • Associated data memory location, call graph,
    etc.
  • For a symbol table (also called a dictionary), we
    care about search, insertion, and deletion
  • We typically dont care about sorted order

3
Hash Tables
  • More formally
  • Given a table T and a record x, with key (
    symbol) and associated satellite data, we need to
    support
  • Insert (T, x)
  • Delete (T, x)
  • Search(T, k)
  • We want these to be fast, but dont care about
    sorting the records
  • The structure we will use is a hash table
  • Supports all the above in O(1) expected time!

4
Hashing Keys
  • In the following discussions we will consider all
    keys to be (possibly large) natural numbers
  • When they are not, have to interpret them as
    natural numbers.
  • How can we convert ASCII strings to natural
    numbers for hashing purposes?
  • Example Interpret a character string as an
    integer expressed in some radix notation. Suppose
    the string is CLRS
  • ASCII values C67, L76, R82, S83.
  • There are 128 basic ASCII values.
  • So, CLRS 67128376 1282 821281 831280
    141,764,947.

5
Direct Addressing
  • Suppose
  • The range of keys is 0..m-1
  • Keys are distinct
  • The idea
  • Set up an array T0..m-1 in which
  • Ti x if x? T and keyx i
  • Ti NULL otherwise
  • This is called a direct-address table
  • Operations take O(1) time!
  • So whats the problem?

6
The Problem With Direct Addressing
  • Direct addressing works well when the range m of
    keys is relatively small
  • But what if the keys are 32-bit integers?
  • Problem 1 direct-address table will have 232
    entries, more than 4 billion
  • Problem 2 even if memory is not an issue, the
    time to initialize the elements to NULL may be
  • Solution map keys to smaller range 0..m-1
  • This mapping is called a hash function

7
Hash Functions
  • U Universe of all possible keys.
  • Hash function h Mapping from U to the slots of a
    hash table T0..m1.
  • h U ? 0,1,, m1
  • With direct addressing, key k maps to slot Ak.
  • With hash tables, key k maps or hashes to slot
    Thk.
  • hk is the hash value of key k.

8
Hash Functions
T
U gtgt K U gtgt m
0
U (universe of keys)
h(k1)
k1
h(k4)
k4
K (actual keys)
k5
h(k2) h(k5)
collision
k2
h(k3)
k3
m - 1
  • Problem collision

9
Resolving Collisions
  • How can we solve the problem of collisions?
  • Solution 1 chaining
  • Solution 2 open addressing

10
Open Addressing
  • Basic idea (details in Section 11.4)
  • To insert if slot is full, try another slot
    (following a systematic and consistent strategy),
    , until an open slot is found (probing)
  • To search, follow same sequence of probes as
    would be used when inserting the element
  • If reach element with correct key, return it
  • If reach a NULL pointer, element is not in table
  • Good for fixed sets (adding but no deletion)
  • Example file names on a CD-ROM
  • Table neednt be much bigger than n

11
Chaining
  • Chaining puts elements that hash to the same slot
    in a linked list

T

U (universe of keys)
k1
k4


k1

k4
K (actual keys)
k5

k7
k5
k2
k7


k3
k2
k3

k8
k6
k8
k6


12
Chaining
  • How to insert an element?

T

U (universe of keys)
k1
k4


k1

k4
K (actual keys)
k5

k7
k5
k2
k7


k3
k2
k3

k8
k6
k8
k6


13
Chaining
  • How to delete an element?
  • Use a doubly-linked list for efficient deletion

T

U (universe of keys)
k1
k4


k1

k4
K (actual keys)
k5

k7
k5
k2
k7


k3
k2
k3

k8
k6
k8
k6


14
Chaining
  • How to search for a element with a given key?

T

U (universe of keys)
k1
k4


k1

k4
K (actual keys)
k5

k7
k5
k2
k7


k3
k2
k3

k8
k6
k8
k6


15
Hashing with Chaining
  • Chained-Hash-Insert (T, x)
  • Insert x at the head of list Th(keyx).
  • Worst-case complexity O(1).
  • Chained-Hash-Delete (T, x)
  • Delete x from the list Th(keyx).
  • Worst-case complexity proportional to length of
    list with singly-linked lists. O(1) with
    doubly-linked lists.
  • Chained-Hash-Search (T, k)
  • Search an element with key k in list Th(k).
  • Worst-case complexity proportional to length of
    list.

16
Analysis of Chaining
  • Assume simple uniform hashing each key in table
    is equally likely to be hashed to any slot
  • Given n keys and m slots in the table, the load
    factor ? n/m average keys per slot
  • What will be the average cost of an unsuccessful
    search for a key?
  • A ?(1?) (Theorem 11.1)
  • What will be the average cost of a successful
    search?
  • A ?(2 ?/2) ?(1 ?) (Theorem 11.2)

17
Analysis of Chaining Continued
  • So the cost of searching O(1 ?)
  • If the number of keys n is proportional to the
    number of slots in the table, what is ??
  • A n O(m) gt ? n/m O(1)
  • In other words, we can make the expected cost of
    searching constant if we make ? constant

18
Choosing A Hash Function
  • Clearly, choosing the hash function well is
    crucial
  • What will a worst-case hash function do?
  • What will be the time to search in this case?
  • What are desirable features of the hash function?
  • Should distribute keys uniformly into slots
  • Should not depend on patterns in the data

19
Hash Functions The Division Method
  • h(k) k mod m
  • In words hash k into a table with m slots using
    the slot given by the remainder of k divided by m
  • Example m 31 and k 78, h(k) 16.
  • Advantage fast
  • Disadvantage value of m is critical
  • Bad if keys bear relation to m
  • Or if hash does not depend on all bits of k
  • What happens to elements with adjacent values of
    k?
  • Elements with adjacent keys hashed to different
    slots good
  • What happens if m is a power of 2 (say 2P)?
  • What if m is a power of 10?
  • Pick m prime number not too close to power of 2
    (or 10)

20
Hash Functions The Multiplication Method
  • For a constant A, 0 lt A lt 1
  • h(k) ?m (kA mod 1)? ? m (kA - ?kA?) ?

21
Hash Functions The Multiplication Method
  • For a constant A, 0 lt A lt 1
  • h(k) ?m (kA mod 1)? ? m (kA - ?kA?) ?
  • Advantage Value of m is not critical
  • Disadvantage relatively slower
  • Choose m 2P, for easier implementation

Fractional part of kA
22
How to choose A?
  • The multiplication method works with any legal
    value of A.
  • Choose A not too close to 0 or 1
  • Knuth Good choice for A (?5 - 1)/2
  • Example m 1024, k 123, A ? 0.6180339887
  • h(k) ?1024(123 0.6180339887 mod 1)?
  • ?1024 0.018169... ? 18.

23
Multiplication Method - Implementation
  • Choose m 2p, for some integer p.
  • Let the word size of the machine be w bits.
  • Assume that k fits into a single word. (k takes w
    bits.)
  • Let 0 lt s lt 2w. (s takes w bits.)
  • Restrict A to be of the form s/2w.
  • Let k ? s r1 2w r0 .
  • r1 holds the integer part of kA (?kA?) and r0
    holds the fractional part of kA (kA mod 1 kA
    ?kA?).
  • We dont care about the integer part of kA.
  • So, just use r0, and forget about r1.

24
Multiplication Method Implementation
w bits
k
s A2w
?
binary point

r0
r1
extract p bits
h(k)
  • We want ?m (kA mod 1)?.
  • m 2p
  • We could get that by shifting r0 to the left by p
    bits and then taking the p bits that were shifted
    to the left of the binary point.
  • But, we dont need to shift. Just take the p most
    significant bits of r0.

25
Hash Functions Worst Case Scenario
  • Scenario
  • You are given an assignment to implement hashing
  • You will self-grade in pairs, testing and grading
    your partners implementation
  • In a blatant violation of the honor code, your
    partner
  • Analyzes your hash function
  • Picks a sequence of worst-case keys that all
    map to the same slot, causing your implementation
    to take O(n) time to search
  • Exercise 11.2-5 when U gt nm, for any fixed
    hashing function, can always choose n keys to be
    hashed into the same slot.

26
Universal Hashing
  • When attempting to defeat a malicious adversary,
    randomize the algorithm
  • Universal hashing pick a hash function randomly
    in a way that is independent of the keys that are
    actually going to be stored
  • pick a hash function randomly when the algorithm
    begins (not upon every insert!)
  • Guarantees good performance on average, no matter
    what keys adversary chooses
  • Need a family of hash functions to choose from

27
Universal Hashing
  • Let ? be a (finite) collection of hash functions
  • that map a given universe U of keys
  • into the range 0, 1, , m - 1.
  • ? is said to be universal if
  • for each pair of distinct keys x, y ? U, the
    number of hash functions h ? ? for which h(x)
    h(y) is at most ?/m
  • In other words
  • With a random hash function from ?, the chance of
    a collision between x and y is at most 1/m (x
    ? y)

28
Universal Hashing
  • Theorem 11.3 (modified from textbook)
  • Choose h from a universal family of hash
    functions
  • Hash n keys into a table of m slots, n ? m
  • Then the expected number of collisions involving
    a particular key x is less than 1
  • Proof
  • For each pair of keys y, x, let cyx 1 if y and
    x collide, 0 otherwise
  • Ecyx lt 1/m (by definition)
  • Let Cx be total number of collisions involving
    key x
  • Since n ? m, we have ECx lt 1
  • Implication, expected running time of insertion
    is ?(1)

29
A Universal Hash Function
  • Choose a prime number p that is larger than all
    possible keys
  • Choose table size m n
  • Randomly choose two integers a, b, such that 1 ?
    a ? p -1, and 0 ? b ? p -1
  • ha,b(k) ((akb) mod p) mod m
  • Example p 17, m 6
  • h3,4 (8) ((38 4) 17) 6 11 6 5

30
A universal hash function
  • Theorem 11.5 The family of hash functions Hp,m
    ha,b defined on the previous slide is universal
  • Proof sketch
  • For any two distinct keys x, y, for a given ha,b,
  • Let r (axb) p, s (ayb) p.
  • Can be shown that r?s, and different (a,b)
    results in different (r,s)
  • x and y collides only when rm sm
  • For a given r, the number of values s such that
    rm sm and r ? s is at most (p-1)/m
  • For a given r, and any randomly chosen s, prob(r
    ? s rm sm) (p-1) / m / (p-1) 1/m
About PowerShow.com