Data Types and Data Structures - PowerPoint PPT Presentation

1 / 40

About This Presentation

Title:

Data Types and Data Structures

Description:

If entry was ith element added, expected search time is 1/(1 i/m) = m/(m-i) ... Otherwise find the first ancestor v such that the entry is in v's left sub-tree. ... – PowerPoint PPT presentation

Number of Views:28

Avg rating:3.0/5.0

Slides: 41

Provided by: charle106

Learn more at: http://www.cse.msu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Data Types and Data Structures

1
Data Types and Data Structures

Data Types
Containers
Dictionaries
Priority Queue
Data Structures
Hash Tables
Binary Search Trees

2
Data types structures
There are numerous options for data structures
for many commonly used abstract data types
Containers Dictionaries Priority
Queues Changing data structures should not
change the correctness of a program, but it can
have a dramatic effect on the speed.
3
Choosing a Data Structure
It is important to choose the proper data
structure when you first design an
algorithm. There are many data structures that
can handle common operations insertion,
deletion, sorting, searching, finding the maximum
or minimum, predecessor or successor,
etc. Different data structure will each take
their own time for the different operations.
4
Guidelines...

Building an algorithm around a properly chosen
data structure leads to both a clean algorithm
and good performance
Using an incorrect data structure can be
disastrous, but you dont always need the best
structure.
Sorting is at the heart of many good algorithms.
Common algorithm design paradigms include
divide-and-conquer, randomization, incremental
construction, and dynamic programming.

5
Fundamental Data Types
An abstract data type is a collection of
well-defined operations that can be performed on
a particular structure. Different data
structures make different tradeoffs that make
certain operations (say, insertion ) faster at
the cost of others (say, searching.) Often there
will be other considerations that will make one
structure more desirable over others.
6
Containers

Hold data for later retrieval
Operations
Insert(item)
Retrieve() typically removing item from
container
Simple data structures for implementing
containers
Stack LIFO
Queue FIFO
Table retrieve by index
Implementation
Linked list or array

7
Dictionaries

Dictionaries are a form of container that permits
access to data items by content (key).
Operations
Insert(key)
delete(pointer to item)
search(key)
Linked list implementation (no sorting)
Insert
Delete
Search
Sorted array implementation
Insert
Delete
Search

8
Priority Queues
Insert(x) Given an item x, insert it into the
priority Queue. Find-Maximum( ) Return the
item with the maximal priority. Delete-Maximum(
) Remove the item from the queue whose key is
maximum.
9
Data Structures

Ways to implement data types
Linked lists
Arrays with auxilary data
Hash table
Binary search tree
Others, of course

10
Hash Tables

Maintain an array to hold your items
Hash the key to determine the index the
specific item should be stored at
Good hash functions
Methods for dealing with collisions
Chaining
Universal hash functions
Open addressing

11
Direct-address hash table

Assumptions
Universe of keys is small (size m)
Set of keys can be mapped to 0, 1, , m-1
No elements have the same key
Use an array of size m
Array contents can be pointer to element
Array can directly store element

12
Hash Functions

Problem with direct-addressed tables
Universe of possible keys U is too large
Set of keys used K may be much smaller
Hash function
Use an array of size Q(m)
Use function h(k) x to determine slot x
h U ? 0, 1, , m-1
Collision
When h(k1) h(k2)

13
Good Hash Functions

Each key is equally likely to hash to any of the
m slots independently of where any other key has
hashed to
Difficult to achieve as this requires knowledge
of distribution of keys
Good characteristics
Must be able to evaluate quickly
May want keys that are close to map to slots
that are far apart

14
Hashing by Height
1 2 3 4 5 6
7 8 9
15
Collisions unavoidable
Even if we have a good function, we will still
have collisions Jan Feb Mar Apr May Jun
Jul Aug Sep Oct Nov Dec
16
Chaining

Create a linked list to store all elements that
map to same table slot
Running time
Insert(T,k) how long? what assumptions?
Search(T,k) how long?
Delete(T,x) pointer to element x, how long, what
assumptions?

17
Search time

Notation
n items
m slots
load factor a n/m
Worst-case search time?
What is worst case?
Expected search time
Simple uniform hashing each element is equally
likely to hash to any of the m slots, independent
of where any other element has hashed to.
Expected search time?

18
Universal hashing

In the worst-case, for any hash function, the
keys may be exactly the worst-case for your
function
Avoid this by choosing the hash function randomly
independent of the keys to be hashed
Key distinction from probabilistic analysis
Universal hash function will work well with high
probability on EVERY input instance but may
perform poorly with low probablity on EVERY input
instance
Probabilistic analysis of static hash function h
says h will work well on most input instances
every time but may perform poorly on some input
instances every time

19
Definition and analysis

Let H be a finite collection of hash functions
that map U into 0, , m-1
This collection is universal if for each pair of
distinct keys k and q in U, the number of hash
functions h in H for which h(k) h(q) is at most
H/m.
If we choose our hash function randomly from H,
this implies that there is at most a 1/m chance
that h(k) h(q).
This leads to the expect length of a chain being
n/m
Note we assume chaining and not open addressing
in analysis

20
An example of universal hash functions

Choose prime p larger than all possible keys
Let Zp 0, , p-1 and Zp 1, , p-1
Clearly p gt m. Why?
ha,b for any a in Zp and b in Zp
ha,b(k) ((akb) mod p) mod m
Hp,m ha,b a in Zp and b in Zp
This family has a total of p(p-1) hash functions
This family of hash functions is universal

21
Open addressing

Store all elements in the table
Probe the hash table in event of a collision
Key idea probe sequence is NOT the same for each
element, depends on initial key
h U x 0, 1, , m-1 ? 0, 1, , m-1
Permutation requirement
h(k,0), h(k,1), , h(k,m-1) is a permutation of
(0, , m-1)

22
Operations

Insert, search straightforward
Why can we not simply mark a slot as deleted?
If keys need to be deleted, open addressing may
not be the right choice

23
Probing schemes

uniform hashing each of m! permutations equally
likely
not typically achieved
linear probing h(k,i) (h(k) i) mod m
Clustering effect
Only m possible probe sequences are considered
quadratic probing h(k,i) (h(k)cidi2) mod m
constraints on c, d, m
better than linear probing as clustering effect
is not as bad
Only m possible probe sequences are considered,
and keys that map to same position do have
identical probe sequences
double hashing h(k,i) (h(k) iq(k)) mod m
q(k) must be relatively prime wrt m
m2 probe sequences considered
Much closer to uniform hashing

24
Search time

Preliminaries
n elements, m slots, a n/m with n lt m
Assumption of uniform hashing
Expected search time on a miss
Given that h(k,i) is non-empty, what is the
probability that h(k,i1) is empty?
What is expected search time then?
Expect insertion time is essentially the same.
Why?
Expected search time on a hit
If entry was ith element added, expected search
time is 1/(1 i/m) m/(m-i)
Sum this over all m and you get 1/a (Hm Hm-n)
This can be bounded by 1/a ln 1/(1-a)

25
Binary search trees

Supports search, min, max, predecessor,
successor, insert, delete, and list all
efficiently
Thus can be used for more than just dictionary
applications
Basic tree property
For any node x
left subtree has nodes lt x
right subtree has nodes gt x

26
Binary Trees
27
Example Search Trees
28
Operations

Search procedure?
search time?
Minimum node in tree rooted at node x?
search time?
Maximum node in tree rooted at node x?
search time?
Listing all nodes in sorted order?
time to list?

29
Successor and Predecessor
Successor Find the minimal entry in the right
sub-tree, if there is a right sub-tree.
Otherwise find the first ancestor v such that the
entry is in vs left sub-tree. Predecessor Find
the maximal entry in the left sub-tree, if there
is a left sub-tree. Otherwise find the first
ancestor v such that the entry is in vs right
sub-tree. In either test, if the root node is
reached, no predecessor/ successor exists.
30
Simple Insertion and Deletion
Insertion Traverse the tree as you would when
searching. When the required branch does not
exist, attach the new entry at that
location. Deletion Three possible cases
exist a) Entry is a leaf Just delete it. b)
Entry has one child Remove entry replacing it
with child. c) Entry had two children Replace
entry with successor. Successor has at most one
child (why?) use step a or b on it.
31
Simple binary search trees

What is the expected height of a binary search
tree?
Difficult to compute if we allow both insertions
and deletions
With insertions, analysis of section 12.4 shows
that expected height is O(log n)

32
Tree-Balancing Algorithms

Red-Black Trees
Splay Trees
Others
AVL Trees
2-3 Trees and 2-3-4 Trees

33
Manipulating Search Trees
34
Red-Black Trees

All nodes in the tree are either red or black.
Every null-child is included and colored black.
All red nodes must have two black children.
Every path from the root to a leaf must have the
same
number of black nodes.
How balanced of a tree will this produce? How
hard will it be to maintain?

35
Example Red-Black Tree
36
Splay trees

No adjustment is done in a splay tree when nodes
are inserted or removed.
All rotations occur within the Search function -
the element being searched for is rotated to the
root of the tree.
Individual operations may take O(n) time
However, it can be shown that any sequence of m
operations including n insertions starting with
an empty tree take O(m log n) time

37
Splay trees

Dynamic optimality conjecture splay trees are as
asymptotically fast on any sequence of operations
as any other type of search tree with rotations.
What does this mean?
Worst case sequence of splay tree operations
takes amortized O(log n) time per operation
Some sequences of operations take less.
Accessing the same ten items over and over again
Splay tree should then take less on these
sequences as well.
One special case that has been proven
search in order from the smallest key to the
largest key, the total time for all n operations
is O(n).