Data Types and Data Structures - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Data Types and Data Structures

Description:

If entry was ith element added, expected search time is 1/(1 i/m) = m/(m-i) ... Otherwise find the first ancestor v such that the entry is in v's left sub-tree. ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 41
Provided by: charle106
Learn more at: http://www.cse.msu.edu
Category:

less

Transcript and Presenter's Notes

Title: Data Types and Data Structures


1
Data Types and Data Structures
  • Data Types
  • Containers
  • Dictionaries
  • Priority Queue
  • Data Structures
  • Hash Tables
  • Binary Search Trees

2
Data types structures
There are numerous options for data structures
for many commonly used abstract data types
Containers Dictionaries Priority
Queues Changing data structures should not
change the correctness of a program, but it can
have a dramatic effect on the speed.
3
Choosing a Data Structure
It is important to choose the proper data
structure when you first design an
algorithm. There are many data structures that
can handle common operations insertion,
deletion, sorting, searching, finding the maximum
or minimum, predecessor or successor,
etc. Different data structure will each take
their own time for the different operations.
4
Guidelines...
  • Building an algorithm around a properly chosen
    data structure leads to both a clean algorithm
    and good performance
  • Using an incorrect data structure can be
    disastrous, but you dont always need the best
    structure.
  • Sorting is at the heart of many good algorithms.
  • Common algorithm design paradigms include
    divide-and-conquer, randomization, incremental
    construction, and dynamic programming.

5
Fundamental Data Types
An abstract data type is a collection of
well-defined operations that can be performed on
a particular structure. Different data
structures make different tradeoffs that make
certain operations (say, insertion ) faster at
the cost of others (say, searching.) Often there
will be other considerations that will make one
structure more desirable over others.
6
Containers
  • Hold data for later retrieval
  • Operations
  • Insert(item)
  • Retrieve() typically removing item from
    container
  • Simple data structures for implementing
    containers
  • Stack LIFO
  • Queue FIFO
  • Table retrieve by index
  • Implementation
  • Linked list or array

7
Dictionaries
  • Dictionaries are a form of container that permits
    access to data items by content (key).
  • Operations
  • Insert(key)
  • delete(pointer to item)
  • search(key)
  • Linked list implementation (no sorting)
  • Insert
  • Delete
  • Search
  • Sorted array implementation
  • Insert
  • Delete
  • Search

8
Priority Queues
Insert(x) Given an item x, insert it into the
priority Queue. Find-Maximum( ) Return the
item with the maximal priority. Delete-Maximum(
) Remove the item from the queue whose key is
maximum.
9
Data Structures
  • Ways to implement data types
  • Linked lists
  • Arrays with auxilary data
  • Hash table
  • Binary search tree
  • Others, of course

10
Hash Tables
  • Maintain an array to hold your items
  • Hash the key to determine the index the
    specific item should be stored at
  • Good hash functions
  • Methods for dealing with collisions
  • Chaining
  • Universal hash functions
  • Open addressing

11
Direct-address hash table
  • Assumptions
  • Universe of keys is small (size m)
  • Set of keys can be mapped to 0, 1, , m-1
  • No elements have the same key
  • Use an array of size m
  • Array contents can be pointer to element
  • Array can directly store element

12
Hash Functions
  • Problem with direct-addressed tables
  • Universe of possible keys U is too large
  • Set of keys used K may be much smaller
  • Hash function
  • Use an array of size Q(m)
  • Use function h(k) x to determine slot x
  • h U ? 0, 1, , m-1
  • Collision
  • When h(k1) h(k2)

13
Good Hash Functions
  • Each key is equally likely to hash to any of the
    m slots independently of where any other key has
    hashed to
  • Difficult to achieve as this requires knowledge
    of distribution of keys
  • Good characteristics
  • Must be able to evaluate quickly
  • May want keys that are close to map to slots
    that are far apart

14
Hashing by Height
1 2 3 4 5 6
7 8 9
15
Collisions unavoidable
Even if we have a good function, we will still
have collisions Jan Feb Mar Apr May Jun
Jul Aug Sep Oct Nov Dec
16
Chaining
  • Create a linked list to store all elements that
    map to same table slot
  • Running time
  • Insert(T,k) how long? what assumptions?
  • Search(T,k) how long?
  • Delete(T,x) pointer to element x, how long, what
    assumptions?

17
Search time
  • Notation
  • n items
  • m slots
  • load factor a n/m
  • Worst-case search time?
  • What is worst case?
  • Expected search time
  • Simple uniform hashing each element is equally
    likely to hash to any of the m slots, independent
    of where any other element has hashed to.
  • Expected search time?

18
Universal hashing
  • In the worst-case, for any hash function, the
    keys may be exactly the worst-case for your
    function
  • Avoid this by choosing the hash function randomly
    independent of the keys to be hashed
  • Key distinction from probabilistic analysis
  • Universal hash function will work well with high
    probability on EVERY input instance but may
    perform poorly with low probablity on EVERY input
    instance
  • Probabilistic analysis of static hash function h
    says h will work well on most input instances
    every time but may perform poorly on some input
    instances every time

19
Definition and analysis
  • Let H be a finite collection of hash functions
    that map U into 0, , m-1
  • This collection is universal if for each pair of
    distinct keys k and q in U, the number of hash
    functions h in H for which h(k) h(q) is at most
    H/m.
  • If we choose our hash function randomly from H,
    this implies that there is at most a 1/m chance
    that h(k) h(q).
  • This leads to the expect length of a chain being
    n/m
  • Note we assume chaining and not open addressing
    in analysis

20
An example of universal hash functions
  • Choose prime p larger than all possible keys
  • Let Zp 0, , p-1 and Zp 1, , p-1
  • Clearly p gt m. Why?
  • ha,b for any a in Zp and b in Zp
  • ha,b(k) ((akb) mod p) mod m
  • Hp,m ha,b a in Zp and b in Zp
  • This family has a total of p(p-1) hash functions
  • This family of hash functions is universal

21
Open addressing
  • Store all elements in the table
  • Probe the hash table in event of a collision
  • Key idea probe sequence is NOT the same for each
    element, depends on initial key
  • h U x 0, 1, , m-1 ? 0, 1, , m-1
  • Permutation requirement
  • h(k,0), h(k,1), , h(k,m-1) is a permutation of
    (0, , m-1)

22
Operations
  • Insert, search straightforward
  • Why can we not simply mark a slot as deleted?
  • If keys need to be deleted, open addressing may
    not be the right choice

23
Probing schemes
  • uniform hashing each of m! permutations equally
    likely
  • not typically achieved
  • linear probing h(k,i) (h(k) i) mod m
  • Clustering effect
  • Only m possible probe sequences are considered
  • quadratic probing h(k,i) (h(k)cidi2) mod m
  • constraints on c, d, m
  • better than linear probing as clustering effect
    is not as bad
  • Only m possible probe sequences are considered,
    and keys that map to same position do have
    identical probe sequences
  • double hashing h(k,i) (h(k) iq(k)) mod m
  • q(k) must be relatively prime wrt m
  • m2 probe sequences considered
  • Much closer to uniform hashing

24
Search time
  • Preliminaries
  • n elements, m slots, a n/m with n lt m
  • Assumption of uniform hashing
  • Expected search time on a miss
  • Given that h(k,i) is non-empty, what is the
    probability that h(k,i1) is empty?
  • What is expected search time then?
  • Expect insertion time is essentially the same.
    Why?
  • Expected search time on a hit
  • If entry was ith element added, expected search
    time is 1/(1 i/m) m/(m-i)
  • Sum this over all m and you get 1/a (Hm Hm-n)
  • This can be bounded by 1/a ln 1/(1-a)

25
Binary search trees
  • Supports search, min, max, predecessor,
    successor, insert, delete, and list all
    efficiently
  • Thus can be used for more than just dictionary
    applications
  • Basic tree property
  • For any node x
  • left subtree has nodes lt x
  • right subtree has nodes gt x

26
Binary Trees
27
Example Search Trees
28
Operations
  • Search procedure?
  • search time?
  • Minimum node in tree rooted at node x?
  • search time?
  • Maximum node in tree rooted at node x?
  • search time?
  • Listing all nodes in sorted order?
  • time to list?

29
Successor and Predecessor
Successor Find the minimal entry in the right
sub-tree, if there is a right sub-tree.
Otherwise find the first ancestor v such that the
entry is in vs left sub-tree. Predecessor Find
the maximal entry in the left sub-tree, if there
is a left sub-tree. Otherwise find the first
ancestor v such that the entry is in vs right
sub-tree. In either test, if the root node is
reached, no predecessor/ successor exists.
30
Simple Insertion and Deletion
Insertion Traverse the tree as you would when
searching. When the required branch does not
exist, attach the new entry at that
location. Deletion Three possible cases
exist a) Entry is a leaf Just delete it. b)
Entry has one child Remove entry replacing it
with child. c) Entry had two children Replace
entry with successor. Successor has at most one
child (why?) use step a or b on it.
31
Simple binary search trees
  • What is the expected height of a binary search
    tree?
  • Difficult to compute if we allow both insertions
    and deletions
  • With insertions, analysis of section 12.4 shows
    that expected height is O(log n)

32
Tree-Balancing Algorithms
  • Red-Black Trees
  • Splay Trees
  • Others
  • AVL Trees
  • 2-3 Trees and 2-3-4 Trees

33
Manipulating Search Trees
34
Red-Black Trees
  • All nodes in the tree are either red or black.
  • Every null-child is included and colored black.
  • All red nodes must have two black children.
  • Every path from the root to a leaf must have the
    same
  • number of black nodes.
  • How balanced of a tree will this produce? How
    hard will it be to maintain?

35
Example Red-Black Tree
36
Splay trees
  • No adjustment is done in a splay tree when nodes
    are inserted or removed.
  • All rotations occur within the Search function -
    the element being searched for is rotated to the
    root of the tree.
  • Individual operations may take O(n) time
  • However, it can be shown that any sequence of m
    operations including n insertions starting with
    an empty tree take O(m log n) time

37
Splay trees
  • Dynamic optimality conjecture splay trees are as
    asymptotically fast on any sequence of operations
    as any other type of search tree with rotations.
  • What does this mean?
  • Worst case sequence of splay tree operations
    takes amortized O(log n) time per operation
  • Some sequences of operations take less.
  • Accessing the same ten items over and over again
  • Splay tree should then take less on these
    sequences as well.
  • One special case that has been proven
  • search in order from the smallest key to the
    largest key, the total time for all n operations
    is O(n).

38
Splay Tree Example
39
Splay Tree Example
40
Specialized Data Structures
  • Strings
  • Geometric shapes
  • Graphs
  • Sets
  • Schedules
Write a Comment
User Comments (0)
About PowerShow.com