Chapter 6: - PowerPoint PPT Presentation

About This Presentation

Title:

Chapter 6:

Description:

6.1 Binnary Tree The Bin Tree ... return false; } Exmple 3 : more effienciently * class Resp { int min, max; boolean ok; Resp(int x, int y, boolean z){ min = x; max ... – PowerPoint PPT presentation

Number of Views:29

Avg rating:3.0/5.0

Slides: 43

Provided by: Peter1571

Category:

Tags: chapter

more less

Transcript and Presenter's Notes

Title: Chapter 6:

1

Chapter 6 Searching trees and more Sorting
Algorithms
6.1 Binnary Tree
The Bin Tree class with traversing methods
6.2 Searching Trees
6.2.1 AVL Trees
6.3 HeapSort and BucketSort
6.3.1 HeapSort
6.3.2 BucketSort

2
Addendum A Webseite with animations for
AVL-trees http//www.seanet.com/users/arsen/
avltree.html A Webseite with animation for
Heapsort http//ciips.ee.uwa.edu.au/morris/
Year2/PLDS210/heapsort.html
3
b
First rotation
a
c
W
Z
x
y
new
a
Second rotation
b
W
c
x
new
y
Z
4
6.3 BucketSort

All sorting procedures we have seen so far are
based on the comparission of two keys
The general bottom bound for the cost for this
kind of procedures is
O(n log n).
For certain sets of keys
Sorting without comparing keys and more efficient
!

5
Idea use the keys to calculate the the storing
addresses for the elements of the sequence to be
sorted (like in Hashing).

Example (ideal situation, not frequent)
Set of n data objects s0, ... , sn-1 with key
values 0, ..., n-1, without duplicates given as
an array S.
Sorting algoritm
for(int i 0, i lt n, i)
TSi.key Si
cost O(n).

6
BucketSort

Sets of n data objects s0, ... , s n-1 with key
values 0, ..., m-1, given as array S.
duplicate keys are allowed.
void BucketSort(S)
int i int j
for(j0 jltm j)
Bj null //the buckets, lists
for(i0 iltn i)
insert(Si, BSi.key() )
for(j0 jltm j) output(Bj)
cost O(nm).

7
RadixSort

Sets of n data objects s0, ... , sn-1 with key
values
0, ..., nk -1, given as an array S. Duplicate
keys allowed.
The bucketsort for that would take O(n nk).
Making it better (RadixSort)
Write the keys on base n. We have numbers of k
ciphers
Run k times the BucketSort algorithm sorting the
objects according to each cipher, in order,
starting from the less significant cipher
(last?)until the most significant one (first?)
(e.g. using mod and div).
cost O(kn).

8
Example for RadixSort

n10, k2.
Sequence to be sorted
64, 17, 3, 99, 79, 78, 19, 13, 67, 34.
1. step insert them in buckets according to the
last cipher
after that, output them in the order they are
3, 13, 64, 34, 17, 67, 78, 99, 79, 19

0 1 2 3 4 5 6 7 8 9
3 13 64 34 17 67 78 99 79 19
9
Continuation RadixSort

2nd. step the sequence obtained from the step 1
3, 13, 64, 34, 17, 67, 78, 99, 79, 19
Insert in the buckets according to the
penultimate cipher
and output them
3, 13, 17, 19, 34, 64, 67, 78, 79, 99.

0 1 2 3 4 5 6 7 8 9
3 13 17 19 34 64 67 78 79 99
10
Generalizing

Ciphers in different possitions can have a
different value range.
Example Date(year, month, day)
( 0..9999, 1..12, 1..31 )
BucketSort the dates according to day, month and
year.

11
General things about binary trees

They are recursive structures, this means, many
algorithms over them are better (shorter, more
elegant) expressed in a recursive way
This means, in most cases it is necessary to
execute recursively the algorithm on one or both
sub-trees and analyze the root node (the order
may vary according to the task)
(one of) The base case(s) (when there is no
recursive call any more) is when the pointer to
the root of the
(sub-)tree is null (empty tree)
To improve efficiency we can avoid recursive
calls when there is no child

12
Example 1 search
Node search(int x, Node y) //returns a
pointer to the node containing y //null if it
is not in the tree if (y null) return
null if (y.key x) return y if (y.key
gt x) return search(x, y.left) return
search(x, y.right)
13
Exmple 2 count
int count(Node y) //returns the number of
nodes in the tree if (y null) return 0
int a count(y.right) int b
count(y.left) return a b 1 //
return count(y.right)count(y.left)1
14
Exmple 3 check if search tree
boolean isBST(Node y) //returns true if the
tree is a //binary search tree if (y
null) return true if (y.right null
y.left null) return true if
(!isBST(y.left) !isBST(y.right))
return false if (y.left ! null
max(y.left) lt y.key y.right ! null
min(y.right) gt y.key) return true
return false
15
Exmple 3 more effienciently
class Resp int min, max boolean ok
Resp(int x, int y, boolean z) min x max y
ok z resp isBST(Node y) if (y
null) return new Resp(0,0,true) Resp a
null, b null c new Resp(y.key, y.key,
true) if (y.left ! null) a
isBST(y.left) if (y.right ! null) b
isBST(y.right) if ( a ! null b ! null)
c.min a.min c.max b.max c.ok
a.ok b.ok a.max lt y.key b.min gt y.key
if (a ! null b null) c.min
a.min c.ok a.ok a.max lt y.key if ( a
null b ! null) c.max b.max c.ok
b.ok b.min gt y.key return c
16
Chapter 7 Selected Algorithms

7.1 External Search

17
7.1 External Search

The algorithms we have seen so far are good when
all data are stored in primary storage device
(RAM). Its access is fast(er)
Big data sets are frequently stored in secondary
storage devices (hard disk). Slow(er) access
(about 100-1000 times slower)
Access always to a complete block (page) of
data (4096 bytes), which is stored in the RAM
For efficiency keep the number of accesses to
the pages low!

For external search a variant of search trees
1 node 1 page
Multiple way search trees!

Definition (Multiple way-search trees)
An empty tree is a multiple way search tree with
an empty set of keys .
Be T0, ..., Tn multiple way-search trees with
keys taken from a common key set S, and be
k1,...,kn a sequence of keys with k1 lt ...lt kn.
Then is the sequence
T0 k1 T1 k2 T2 k3 .... kn Tn
a multiple way-search trees only when
for all keys x from T0 x lt k1
for i1,...,n-1, for all keys x in Ti, ki lt x lt
ki1
for all keys x from Tn kn lt x

20
B-Tree

Definition 7.1.2
A B-Tree of Order m is a multiple way tree with
the following characteristics
1 ? (keys in the root) ? 2m and
m ? (keys in the nodes) ? 2m
for all other nodes.
All paths from the root to a leaf are equally
long.
Each internal node (not leaf) which has s keys
has exactly s1 children.

21
Example a B-tree of order 2
22
Assessment of B-trees

The minimal possible number of nodes in a B-tree
of order m and height h
Number of nodes in each sub-tree
1 (m1) (m1)2 .... (m1)h-1
( (m1)h 1) / m.
The root of the minimal tree has only one key and
two children, all other nodes have m keys.
Altogether number of keys n in a B-tree of
height h
n ? 2 (m1)h 1
Thus the following holds for each B-tree of
height h with n keys
h ? logm1 ((n1)/2) .

23
Example

The following holds for each B-tree of height h
with n keys
h ? logm1 ((n1)/2).
Example for
Page size 1 KByte and
each entry plus pointer 8 bytes,
If we chose m63, and for an ammount of data of
n 1 000 000
We have h ? log 64 500 000.5 lt 4 and with
that hmax 3.

24
Algorithms for searching keys in a B-tree

Algorithm search(r, x)
//search for key x in the tree having as root
node r
//global variable p
in r, search for the first key y gt x or
until no more keys
if y x stop search, p r, found
else
if r a leaf stop search, p r, not found
else
if not past last key search(pointer to
node before y, x)
else search(last pointer, x)

25
Algorithms for inserting and deleting of keys in
a B-tree

Algorithm insert (r, x)
//insert key x in the tree having root r
search for x in tree having root r
if x was not found
be p the leaf where the search stopped
insert x in the right position
if p now has 2m1 keys
overflow(p)

26
Algorithm Split (1)

Algorithm
overflow (p) split (p)
Algorithm split (p)
first case p has a parent q.
Divide the overflowed node. The key of the middle
goes to the parent.
remark the splitting may go up until the root,
in which case the height of the tree is
incremented by one.

27
Algorithm Split (2)

Algorithm split (p)
second case p is the root.
Divide overflowed node. Open a new level above
containing a new root with the key of the middle
(root has one key).

28
Algorithm delete (r,x)

//delete key x from tree having root r
search for x in the tree with root r
if x found
if x is in an internal node
exchange x with the next bigger key x' in
the tree
// if x is in an internal node then there
must
// be at least one bigger number in the
tree
//this number is in a leaf !
be p the leaf, containing x
erase x from p
if p is not in the root r
if p has m-1 keys
underflow (p)

29
Algorithm underflow (p)

if p has a neighboring node with sgtm nodes
balance (p,p')
else
// because p cannot be the root, p must
have a neighbor with m keys
be p' the neighbor with m keys merge
(p,p')

30
Algorithm balance (p, p') // balance node p with
its neighbor p' (s gt m , r ?(ms)/2? -m )
31
Algorithm merge (p,p') // merge node p with its
neighbor perform the following operation

afterwards
if( q ltgt root) and (q has m-1 keys) underflow
(q)
else (if(q root) and (q empty)) free q let root
point to p

32
Recursion

If when performing underflow we have to perform
merge, we might have to perform underflow again
one level up
This process might be repeated until the root.

33
ExampleB-Tree of order 2 (m 2)
34
Cost

Be m the order of the B-tree,
n the number of keys.
Costs for search , insert and delete
O(h) O(logm1 ((n1)/2) )
O(logm1(n)).

35
Remark

B-trees can also be used as internal storage
structure
Especially B-trees of order 1
(then only one or 2 keys in each node
no elaborate search inside the nodes).
Cost of search, insert, delete
O(log n).

36
Remark use of storage memory

Over 50
reason the condition
1/2k ? (keys in the node) ? k
For nodes ? root
(k2m)

Even higher usage ratio of memory is possible to
achieve with the following condition ( 66)
2/3k ? (keys in nodes) ? k
For all nodes and their children
This can be reached by 1) modified balancing also
when inserting 2) split only then, when 2
neighbors are full.
Drawback More frequent reorganization is
necessary when inserting and deleting.
.

38
7.2 External Sorting

Problem Sorting big amount of data, as in
external searching, stored in blocks (pages).
efficiency number of the access to pages should
be kept low!
Strategy Sorting algorithm which processes the
data sequentially (no frequent page exchanges)
MergeSort!

Start n data in a file g1,
divided in pages of size b
Page 1 s1,,sb
Page 2 sb1,s2b
Page k s(k-1)b1 ,,sn
( k n/b )
When sequentially processed only k page accesses
instead of n.

40
Variation of MergeSort for external sorting

MergeSort Divide-and-Conquer-Algorithm
for external sorting without divide-step,
only merge.
Definition run ordered subsequence within a
file.
Strategy by merging increasingly generated runs
until everything is sorted.

41
Algorithm

1. Step Generate from the sequence in the input
file g1
starting runs and distribute them in two
files f1 and f2,
with the same number of runs (?1) in each.
(for this there are many strategies, later).
Now use four files f1, f2, g1, g2.

2. Step (main step)
While the number of runs gt 1 repeat
Merge each two runs from f1 and f2 to a double
sized run alternating to g1 und g2, until there
are no more runs in f1 and f2.
Merge each two runs from g1 and g2 to a double
sized run alternating to f1 and f2, until there
are no more runs in g1 und g2.
Each loop two phases