B-Trees:%20Balanced%20Trees%20for%20Use%20with%20Random%20Access%20Secondary%20Storage - PowerPoint PPT Presentation

About This Presentation
Title:

B-Trees:%20Balanced%20Trees%20for%20Use%20with%20Random%20Access%20Secondary%20Storage

Description:

Impact of memory organization on the running time of algorithms. B-trees ... doubly linked list: search O(n), each successive linc requires a different block ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 40
Provided by: gregory119
Learn more at: https://cs.hofstra.edu
Category:

less

Transcript and Presenter's Notes

Title: B-Trees:%20Balanced%20Trees%20for%20Use%20with%20Random%20Access%20Secondary%20Storage


1
B-TreesBalanced Trees for Use with Random
Access Secondary Storage
  • Gerda Kamberova
  • Department of Computer Science
  • Hofstra University

2
Overview
  • Dynamic Set/Dictionary on a Disk Drive B-trees
  • Memory
  • Motivation
  • Memory hierarchy
  • Impact of memory organization on the running time
    of algorithms
  • B-trees
  • Definition and examples
  • bounding the height of a B-tree
  • Operations on a B-tree search, insert, delete

3
Memory Hierarchy
  • Up to now we assumed that read and write are done
    from/to main memory and that it takes fixed
    minimal amount of time to complete them those
    operations
  • Some applications deal with huge amounts of
    data that cannot all fit into the main memory
  • analysis of sci data
  • processing financial transactions,
  • organization and maintenance of databases
  • telephone directories,
  • library catalogs, etc.
  • B-Trees are balanced search trees designed to
    work well on direct access secondary storage
    devices (minimizing disk I/O operations)

4
Memory Hierarchy
  • Computers have hierarchy of different memories
    which vary in size and speed (in increasing size
    and decreasing speed order)
  • CPU registers, slower, larger in size,
  • Cache, order of magnitude slower than cache
  • Main memory (RAM) about 2 orders of magnitude
    slower than cache
  • SRAM
  • DRAM
  • Disks 100,000 to 1,000,000 times slower than
    main memory
  • OS support general mechanisms that allow most
    memory accesses to be fast. The mechanism are
    based on the property locality-of-reference of
    property of most software.

5
Locality-of-Reference and Memory Access
  • Locality-of-reference
  • Temporal locality (TL) if a program accesses a
    certain memory location now, it is likely it will
    access it in the near future
  • Spatial locality (SL) if a program accesses a
    certain memory location now, it is likely it will
    access other close-by locations in near future
  • Caching and Blocking
  • design choices for two-level memory systems
  • present in interfaces
  • between main memory and cache memory
  • between external memory and main memory
  • Caching motivated by TL,
  • bring data from main memory into cache , hoping
    that they will be needed soon, and then the
    response will be fast then going to main memory
  • Data are accessed in blocks called cache lines
  • Blocking motivated by SL,
  • If location x is required from secondary
    memory, bring to main memory not only data from
    x , but also data from close by locations to x
  • Data are accessed in blocks, called pages (disk
    blocks).

6
Implications of Locality-of-Reference
  • In addition, the blocking for external memory is
    motivated by hardware characteristics of external
    storage devices
  • By using blocking the secondary memory is
    perceived much faster then it is.
  • Implications of locality-of-reference for
    programmers.
  • The programmer usually does not have to be overly
    concerned with memory hierarchy and how blocking
    and caching are implemented, still one should try
    to
  • Use TL if an algorithm calls for several
    accesses to the same variable, try to group these
    accesses as close as possible in execution order
    .
  • Use SL if an algorithm calls for accessing a
    certain location x in an array or a certain field
    in an object, try to group access to locations
    spatially close to x as close as possible in
    execution order
  • When selecting an algorithm can we take an
    advantage of the locality of reference?

7
Dynamic Set on Secondary Storage
  • Goal minimize disk accesses needed to perform
    search or updates.
  • It is preferable to do many main memory accesses
    instead of one disk access.
  • Disk accesses complexity on various
    implementations of a dynamic set
  • Use pages (blocks) read from disk as crude
    approximation of time spent accessing the disk
  • doubly linked list search O(n), each successive
    linc requires a different block
  • Sorted array search is O(log n), still require
    Theta(n/B) accesses for insert and delete.
  • Balanced BST, skip lists or other structures with
    logarithmic times worst case, each accessed
    node is in a different block O(log n) accesses.
  • B-Trees O(log n/log B)
  • Idea Trade 1 slow disk access for O(B) very
    fast , where B is the block size.

8
B-Trees
  • Balanced search trees designed to work well on
    data stored on disks. Multiple keys are stored
    sorted in a node. If a node keeps m keys, it has
    m1 children.
  • Property n-node B-tree has height O(log n)
  • Max branching factor (BF) depends on disk block
    size.
  • For large B-trees stored on disk, branching
    factor (BF) between 50 an 2000 often used.

Root(T) is M
M
height 2
D H
Q T X
F G
S K L
Y Z
V W
R S
N P
B C
9
Conventions
  • Modify pseudo code language by adding
  • DiskRead(x) reads page containing object x into
    main memory
  • DiskWrite(x) writes page containing object x
    into secondary storage
  • Assume pages no more in used are flushed from
    main memory
  • Usually want B-tree node to be the size of a
    whole disk page
  • For simplicity, ignore data information, in
    practice most common to store with each key a
    pointer to another disk page with the data.

10
B-Tree Definition
  • B-Tree is a rooted tree with nodes having the
    following properties.
  • Every x has the following fields
  • nx, number keys stored in x
  • The nx keys are sorted
  • key1xltkey2xltlt keynxx
  • leafx is TRUE if x is a leaf, and FALSE
    otherwise
  • If x is an internal node, x has nx1 children
    which are accessed by pointers c1x lt c2x lt
    lt cnx1x
  • (analogy with leftx and rightx on binary
    tree)
  • The keys in a node x separate the ranges of the
    keys stored in the children

key1x key2x keynxx
C1x
C2x
Cnx-1x
Cnxx

ltkey1x
gtkey1x ltkey2x
gtkey nx-1 x ltkey nx x
gtkeynxx
11
B-Tree Definition (cont)
  • 4. Every leaf has the same depth, the height of
    the tree h
  • 5. Let tgt2 be an integer the minimum branching
    factor (the minimum out-degree of the B-Tree).
  • Every node except the root must have gt t-1 keys
    and thus gt t children (nxgtt-1)
  • If the tree is not empty, nrootTgt1, and thus
    the root has at least 2 children
  • Every node contains lt 2t-1 keys, and thus has at
    most 2t children
  • Thus BF of the root is between 2 and 2t, each
    node other than the root has BF between t and 2t
  • Example t2, every internal node has
    between 2 and 4 children (2-3-4 tree)

12
The Height of B-Tree
  • The number disk accesses for the operation is
    bounded by the height, thus O(h)
  • Theorem If n gt 1, then for any n-key B-tree T
    of height h and minimum BF t gt2,
  • Proof If we prove the statement for the
    min number-key B-tree of height h, M , then it
    will be true for any tree of height h.

13
B-Tree Height
  • Proof (cont)


Root(T)
t
t
t
t
t
t
14
Basic Operations
  • Assume root(T) always in main memory, so never do
    DiskRead on the root,however must do DiskWrite
    when the root is changed.
  • Searching stright forward generalization of BST
    search
  • Ex search S
  • Complexity
  • to find/not find the node lt log(n1)/log t
  • At each node, O(log t) to do Binary search on the
    sorted keys and decide which child to go to
  • 1 DiskRead to get the page containing the child

Root(T)
M
D H
Q T X
F G
S K L
Y Z
V W
R S
N P
B C
15
Basic B-Tree Operations
  • Creating an empty tree O(1) time
  • Splitting a node
  • Important operation for insertion is splitting a
    full node y (with 2t-1 keys) around its median
    key into 2 nodes having t-1 keys each.
  • The median key moves into ys parent (which must
    not be full prior to splitting y)
  • Ex t4, max 7 keys in node, max BF 8
  • If y is the root, the tree grows in height by 1

N S W
N W
Tc
Ta
Tc
Ta
Tb
Tb2
Tb1
P Q R S T U V
P Q R
T U V
8
8
4
5
6
7
1
5
6
7
1
2
3
2
3
4
16
Basic Operations Split (cont)
  • B_Tree_split-child(x,i,y),
  • splits the full child y of the non-full node
    x already read into memory into two subtrees,
  • Median key moves into x
  • Complexity of Split
  • Time to copy half of pointers and
    keys into new nodes and remove y
  • Disk access allocate one node on disk write 3
    to disk, O(1)

17
Basic Operations Insert
  • Idea
  • Use a single pass going down the tree, as for
    search search is performed to locate the leaf in
    which to insert the new key. At each non-full
    node a binary search will be performed to decide
    which subtree to follow
  • whenever full nodes are encountered on the search
    path, split them, and continue recursively insert
    on one of the newly created subtrees.
  • Start at the root,
  • if it is full, prior to continuing, create a new
    node and split the root pushing the median key of
    the root up into the new node. (This is the only
    way B-Tree height grows.)

RootT
rootT
H
A D F H L N P
A D F
L N P
18
Basic Operations Insert (cont)
  • The procedure in the textbook implements this
    one-pass insert.
  • It starts from the root,
  • if it is full, prior to continuing, it will
    create a new node and split the root pushing the
    median key of root up into the new node. The key
    is inserted always in a non full leaf
    (terminating condition for the recursion).
  • During the search the procedure detects a full
    child that must be visited and splits it prior
    to making a recursive call to one of the two new
    children. This will guarantee, that when a key
    is inserted into a leaf, the leaf is non full.
  • Complexity of Insert
  • The number of disk accesses (nodes read) is O(h),
    at most h splits, thus at most O(h) nodes
    allocated.
  • The CPU time O(th).

19
Insert Example
  • Given, t3, full node has 5 keys
  • Insert C

Insert in non-full leaf
20
Insert Example
  • Given, t3
  • Insert Q

split
G M P T X
A B C D E
J K
Q R S
U V
N O
Y Z
21
Insert Example
Full root, split
G M P T X
  • Given, t3
  • Insert L

A B C D E
J K
Q R S
U V
N O
Y Z
P
G M
T X
A B C D E
J K
Y Z
Q R S
U V
N O
Insert here
P
G M
T X
A B C D E
J K L
Y Z
Q R S
U V
N O
22
Insert Example
  • Given, t3
  • Insert F

Full , split
23
Basic Operation Deletion
  • Key ideas is to
  • ensure as you move down the tree that the node
    to visit (i.e. on the path from the root to the
    node with the key to be deleted) has at least
    1(t-1) t, keys, if not well rearrange the
    tree before continuing
  • this way, if a key is deleted from a node still
    the min keys is maintained
  • Let x be the current node when searching for the
    node with key k to delete
  • Case 1 x is a leaf, just delete k from x
  • Case 2 k is in x
  • let y is the child before k and z is the child
    after
  • Case2a at least t keys in y
  • Case2b at least t keys in z
  • Case2c t-1 keys in both y and z (will have to
    rearrange)
  • Note y and z could be leaves

24
Deletion, Cases 2a,b,c x has the key
Keep in memory to put pred
x
x
y
y
z
z







c k o
c j o
(y has at least t)
2a




f
m
f
m
Delete of pred of k (recursively)
Ta
Tb
Ta
Tb
j
i
j
i
ks pred
ks succ
Merge the nodes y and z moving k as a median
key In the new node x, delete k recursively from
x. Note that if x
was the root with single (y and z have t-1 each)
key k, the height shrinks.
Delete of succ of k (recursively)
2b
2c
(z has at leasl t)
x
y
z



c o



c i o
Keep in memory to put pred
f k m


x
f
m
Ta
Tb
Ta
Tb
j
i
j
i
25
Basic Operation Deletion (cont)
  • Key idea is to
  • ensure as you move down the tree that at the
    nodes visited, the number of keys is always at
    least one more than the minimum number allowed, t
  • Let x be the current node when searching for the
    node with key k to delete
  • Case 1 x is a leaf, just delete k from x
  • Case 2 k is in x
  • Case 3 k is not in the current node x and the
    node z we want to go to next has t-1 keys (need
    to rearrange the tree)
  • Case 3a at least t keys in y
  • Case 3b t-1 keys in each y and z
  • Note roles of y and z can be interchanged (the
    rotation will change direction, see next)
  • Also y and z could be leaves

26
Deletion, Cases 3a,c x does not have the key,
node to go next has t-1 keys
x (current node, has at least t)
Example Delete k
y
z (we want to come here, but z has t-1 keys)




c i o


g h
j
Do left-to-right-like rotation around
y
Merge nodes y and z, drop i as median key in the
merged node
Ta
Tc
Tb
t -1 keys in y (z)
At least t keys In y
Rearrange so the node to visit has t keys
3a
3c

has gt t-1
y newborn sib
x (where you want to go, has t keys now)



c h o
x



c o


g
i j
g h i j
Then continue search at x
Ta
Tb
Tc
Tb
Tc
Ta
27
Example 1
X not leaf X does not have key All on path to
leaf 3 keys
  • Given B-tree rooted in x, t3, delete F

X
P
T x
C G M
Y Z
A B
D E F
Q R S
N O
U V
J K L
1
P
T x
C G M
Y Z
A B
D E
Q R S
N O
U V
J K L
28
Example 1
X not leaf X has key Y , left, has t keys Put
pred of key up
  • t3, delete M

X
P
X
C G M
T x
Y Z
A B
D E
Q R S
N O
U V
J K L
y
2a
P
T x
C G L
Y Z
A B
D E
Q R S
N O
U V
J K
29
Example 1
X not leaf X has key Y an z have t-1 keys Merge
and drop Recursively delete
x
  • t3, delete G

P
x
T x
C G L
Y Z
A B
D E
Q R S
N O
U V
J K
y
z
2c
P
T x
C L
Y Z
A B
D E G J K
Q R S
N O
U V
30
Example 1
X not leaf, X does not have key Z to go next has
t-1 keys Y has t-1 keys too Merge y and z drop
root Recursively delete
x
  • t3, delete D

P
z
y
T x
C L
Y Z
A B
D E J K
Q R S
N O
U V
3c
x
C L P T X
Y Z
D E J K
Q R S
N O
U V
A B
x
C L P T X
Y Z
D E J K
Q R S
N O
U V
A B
h shrinks
31
Example 1
X not leaf, X does not have key Z to go next has
t-1 keys Y has t keys Rotate-like R to
L Recursively delete
  • t3, delete B

x
C L P T X
z
Y Z
A B
E J K
Q R S
N O
U V
y
3a
E L P T X
Y Z
A B C
J K
Q R S
N O
U V
h shrinks
32
Example 2
x does not have h, node to visit has 1 key, its
sibling has 2
  • Given B-tree rooted in x, t2, delete H

X
K
F
P W
Y
B
H
M
S U
Z
X
C D
A
J
G
N
L
V
Q R
T
3a
x
P
F K
W
Y
B
H
M
S U
Z
X
C D
A
J
G
N
L
V
Q R
T
33
Example 2
x does not have h, node to visit has 1 key, its
sibling has 1 too
  • delete H (cont)

x
P
F K
W
Y
B
H
M
S U
Z
X
C D
A
J
G
N
L
V
Q R
T
mergedrop
3b
P
F
x
W
Y
B
H K M
S U
Z
X
C D
A
J
G
N
L
V
Q R
T
34
Example 2
x has h, is not leaf, y and z have 1 key each
  • delete H (cont)

P
F
x
W
Y
B
H K M
S U
Z
X
C D
A
J
G
N
L
V
Q R
T
y
z
mergedrop
2c
P
F
W
Y
B
K M
S U
G H J
Z
X
C D
A
N
L
V
Q R
T
x is a leaf with h, delete (case 1)
35
Example 2
x does not have L, node to visit has 1 key,
its sibling has 1 too
  • Result from Delete H, now delete L

x
P
F
W
Y
B
K M
S U
G J
Z
X
C D
A
N
L
V
Q R
T
mergedrop
3c
x
F P W
K M
Y
B
S U
G J
Z
X
C D
A
N
L
V
Q R
T
Height shrinks
36
Example 2
x not leaf x does not have key need to go to z
with 1 key into a node with 1 key, Y has 2
x
  • Delete L

F P W
x
Y
B
SU
K M
G J
Z
X
C D
A
N
L
V
Q R
T
z
y
3a
F P W
Y
B
SU
J M
G
Z
X
C D
A
N
K L
V
Q R
T
37
B-Tree Delete
  • Recall BST delete delete key from
  • leaf
  • internal node with one child
  • internal node with two children
  • Delete a key k from B-Tree T rooted at x.
  • The node x is in memory.
  • Go in one pass, from the root down
  • The procedure is always called recursively on a
    tree rooted in a node with at least t keys, one
    of these keys might have to be pushed down to a
    child before continuing down
  • If it ever happens that the root x becomes with
    no keys (may happen in 2c or 3b), the only child
    of x becomes the root, decreasing the height.
  • Only the root may become empty (all others have gt
    1 key after manipulation)
  • Next, we just sketch the pseudo-code with the
    above understanding

38
(No Transcript)
39
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com