CSC%20172%20DATA%20STRUCTURES - PowerPoint PPT Presentation

About This Presentation
Title:

CSC%20172%20DATA%20STRUCTURES

Description:

csc 172 data structures – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 39
Provided by: Thad167
Category:

less

Transcript and Presenter's Notes

Title: CSC%20172%20DATA%20STRUCTURES


1
CSC 172 DATA STRUCTURES
2
SETS and HASHING
  • Unadvertised in-store special SETS!
  • in JAVA, see Weiss 4.8
  • Simple Idea Characteristic Vector
  • HASHING...The main event.

3
Representation of Sets
  • List
  • Simple O(n) dictionary operations
  • Binary Search Trees
  • O(log n) average time
  • Range queries, sorting
  • Characteristic Vector
  • O(1) dictionary ops, but limited to small sets
  • Hash Table
  • O(1) average for dictionary ops
  • Tricky to expand, no range queries

4
Characteristic Vectors
  • Boolean Strings whose position corresponds to the
    members of some fixed universal set
  • A 1 in a location means that the element is in
    the set
  • A 0 means that it is not

5
MUSIC THEORY
  • A chord is a set of notes played at the same
    time.
  • Represented by a 12 bit vector called a pitch
    class
  • B,A,A,G,G,F,F,E,D,D,C,C
  • 000010010001 represents C major
  • 000010001001 represents C minor
  • Rotation is transposition
  • Bit reversal is inversion

6
UNIX file privileges
  • user, group, others x read, write, execute
  • 9 possible privileges
  • Type ls l on UNIX
  • total 142
  • -rw-rw-r-- 1 pawlicki none 76 Jun 20
    2000 PKG416.desc
  • -rw-rw-r-- 1 pawlicki none 28906 Jun 20
    2000 PKG416.pdf
  • -rw-rw-r-- 1 pawlicki none 1849 Jun 20
    2000 let.1
  • -rw-rw-r-- 1 pawlicki none 0 Apr 2
    1303 out
  • -rw-rw-r-- 1 pawlicki none 39891 Jun 20
    2000 stapp.uu

7
UNIX files
  • The order is rwx for each of user (owner), group,
    and others
  • So, a protection mode of 110100000 means that the
    owner may read and write (but not execute), the
    group can read only and others cannot even read

8
GAMBLING
  • A deck has 52 cards
  • 2C,2H,2S,2D,3C, .... KD,AC,AH,AS,AD
  • Represent a hand as a vector of 52 bits
  • 00000000000000000000000000000000000000000000000001
    01 is a pair of aces
  • In Texas Hold'em everyone gets two hole cards
    and 5 board cards
  • We can use bitwise to find hands

9
CV advantages
  • If the universal set is small, sets can be
    represented by bits packed 32 to a word
  • Insert, delete, and lookup are O(1) on the proper
    bit
  • Union, intersection, difference are implemented
    on a word-by-word basis
  • O(m) where m is the size of the set
  • Small constant factor (1/32)
  • Fast, machine operations

10
Hashing
  • A cool way to get from an element x to the place
    where x can be found
  • An array 0..B-1 of buckets
  • Bucket contains a list of set elements
  • B number of buckets
  • A hash function that takes potential set elements
    and quickly produces a random integer 0..B-1

11
Example
  • If the set elements are integers then the
    simplest/best hash function is usually h(x) x
    B or h(x) x - (xB), (never 0).
  • Suppose B 6 and we wish to store the integers
  • 70, 53, 99, 94, 83, 76, 64, 30
  • They belong in the buckets 4, 5, 3, 4, 5, 4, 4,
    and 0
  • Note If B 7 0,4,1,3,6,6,1,2

12
Pitfalls of Hash Function Selection
  • We want to get a uniform distribution of elements
    into buckets
  • Beware of data patterns that cause non-uniform
    distribution

13
Example
  • If integers were all even, then B 6 would cause
    only buckets 0,2, and 4 to fill
  • If we hashed words in the UNIX dictionary into 10
    buckets by length of word then 20 go into bucket
    7

14
Dictionary Operations
  • Lookup
  • Go to head of bucket h(x)
  • Search for bucket list. If x is in the bucket
  • Insertion append if not found
  • Delete list deletion from bucket list

15
Analysis
  • If we pick B to be new N, the number of elements
    in the set, then the average list is O(1) long
  • Thus, dictionary ops take O(1) time
  • Worst case all elements go into one bucket
  • O(n)

16
Managing Hash Table Size
  • If n gets as high as 2B, create a new hash table
    with 2B buckets
  • Rehash every element into the new table
  • O(n) time total
  • There were at least n inserts since the last
    rehash
  • All these inserts took time O(n)
  • Thus, we amortize the cost of rehashing over
    the inserts since the last rehash
  • Constant factor, at worst
  • So, even with rehashing we get O(1) time ops

17
Collisions
  • A collision occurs when two values in the set
    hash to the same value
  • There are several ways to deal with this
  • Chaining (using a linked list or some secondary
    structure)
  • Open Addressing
  • Double hashing
  • Linear Probing

18
Chaining
Very efficient Time Wise
Other approaches Use less space
?
19
Open Addressing
  • When a collision occurs,
  • if the table is not full find an available space
  • Linear Probing
  • Quadratic Probing
  • Double Hashing

20
Linear Probing
  • If the current location is occupied, try the next
    table location
  • LinearProbingInsert(K)
  • if (table is full) error
  • probe h(K)
  • while (tableprobe is occupied)
  • probe probe M
  • tableprobe K
  • Walk along table until an empty spot is found
  • Uses less memory than chaining (no links)
  • Takes more time than chaining (long walks)
  • Deleting is a pain (mark a slot as having been
    deleted)

21
Linear Probing
h(K) K 13
Insert 18, 41, 22, 59, 32, 31, 73
h(K) 5,
22
Linear Probing
h(K) K 13
Insert 18, 41, 22, 59, 32, 31, 73
h(K) 5, 2,
23
Linear Probing
h(K) K 13
Insert 18, 41, 22, 59, 32, 31, 73
h(K) 5, 2, 9,
24
Linear Probing
h(K) K 13
Insert 18, 41, 22, 59, 32, 31, 73
h(K) 5, 2, 9, 7,
25
Linear Probing
h(K) K 13
Insert 18, 41, 22, 59, 32, 31, 73
h(K) 5, 2, 9, 7, 6,
26
Linear Probing
h(K) K 13
Insert 18, 41, 22, 59, 32, 31, 73
h(K) 5, 2, 9, 7, 6, 5,
27
Linear Probing
h(K) K 13
Insert 18, 41, 22, 59, 32, 31, 73
h(K) 5, 2, 9, 7, 6, 5,
28
Linear Probing
h(K) K 13
Insert 18, 41, 22, 59, 32, 31, 73
h(K) 5, 2, 9, 7, 6, 5,
29
Linear Probing
h(K) K 13
Insert 18, 41, 22, 59, 32, 31, 73
h(K) 5, 2, 9, 7, 6, 5,
30
Linear Probing
h(K) K 13
Insert 18, 41, 22, 59, 32, 31, 73
h(K) 5, 2, 9, 7, 6, 5,
8
31
Linear Probing
h(K) K 13
Insert 18, 41, 22, 59, 32, 31, 73
h(K) 5, 2, 9, 7, 6, 5,
8
73
32
Double Hashing
  • If the current location is occupied, try another
    table location
  • Use two hash functions
  • If M is prime, eventually will examine every
    location
  • DoubleHashInsert(K)
  • if (table is full) error
  • probe h1(K)
  • offset h2(K)
  • while (tableprobe is occupied)
  • probe (probeoffset) M
  • tableprobe K
  • Many of the same (dis)advantages as linear
    probing
  • Distributes keys more evenly than linear probing

33
Quadratic Probing
  • Don't step by 1 each time. Add i2 to the h(x)
    hashed location (mod B of course) for i 1,2,...

34
Double Hashing
h1(K) K 13 h1(K) 8 - K 8
Insert 18, 41, 22, 59, 32, 31, 73
h1(K) 5, 2, 9, 7, 6, 5,
8
h2(K) 6, 7, 2, 5, 8, 1,
7
35
Double Hashing
h1(K) K 13 h1(K) 8 - K 8
Insert 18, 41, 22, 59, 32, 31, 73
h1(K) 5, 2, 9, 7, 6, 5,
8
h2(K) 6, 7, 2, 5, 8, 1,
7
31
36
Double Hashing
h1(K) K 13 h1(K) 8 - K 8
Insert 18, 41, 22, 59, 32, 31, 73
h1(K) 5, 2, 9, 7, 6, 5,
8
h2(K) 6, 7, 2, 5, 8, 1,
7
31
73
37
Theoretical Results
38
Expected Probes
1.0
0.5
1.0
Write a Comment
User Comments (0)
About PowerShow.com