The BTree Index - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

The BTree Index

Description:

Directory nodes below the root are called internal nodes. ... loaded in a left-to-right fashion so that successive inserts normally occur to ... – PowerPoint PPT presentation

Number of Views:298
Avg rating:3.0/5.0
Slides: 57
Provided by: ngot
Learn more at: http://www.cs.sjsu.edu
Category:
Tags: btree | index

less

Transcript and Presenter's Notes

Title: The BTree Index


1
The B-Tree Index
Lecture 22 CS 157A
  • Prof. Sin-Min Lee
  • Department of Computer Science
  • San Jose State University

2
Outline
  • Introduction
  • The B-tree shape
  • Dynamic changes in the B-tree
  • Properties of the B-tree
  • Create index statement syntax
  • More about the B-tree
  • Summary

3
Introduction
  • A B-tree is a keyed index structure, comparable
    to a number of memory resident keyed lookup
    structures such as balanced binary tree, the AVL
    tree, and the 2-3 tree.
  • The difference is that a B-tree is meant to
    reside on disk, being made partially
    memory-resident only when entries int the
    structure are accessed.
  • The B-tree structure is the most common used
    index type in databases today.
  • It is provided by ORACLE, DB2, and INGRES.

4
(No Transcript)
5
The B-Tree Shape
  • A B-tree is built upside down with the root at
    the top and the leaves at the bottom.
  • All nodes above the leaf level, including the
    root, are called directory nodes or index nodes.
  • Directory nodes below the root are called
    internal nodes.
  • The root node is known as level 1 of the B-tree
    and successively lower levels are given
    successively larger level numbers with the leaf
    nodes at the lowest level.
  • The total number of levels is called the depth
    of the B-tree.

6
  • Balanced and Unbalanced Trees
  • Trees can be balanced or unbalanced.
  • In a balanced tree, every path from the route to
    a leaf node is the same length.
  • A tree that is balanced has at most logorder n
    levels. This is desirable for an index.

7
The Problem of Unbalanced Trees
  • a. A Troublesome Search Tree
  • b. A More Troublesome Search Tree

1
1
2
2
3
3
4
4
5
6
5
7
8
9
8
Disadvantage of unbalanced tree
  • Searching an unbalanced tree may
  • require traversing an arbitrary and
    unpredictable number of nodes and pointers.

9
Unbalanced Tree (cont.)
  • Problems
  • 1. The levels of the tree are only sparsely
    filled
  • 2. Resulting in long
  • 3. Deep paths and defeating the purpose of
    binary trees in the first place.

10
The general name for B-trees is multiway trees.
However, the best known of them have their very
own names 2-3 trees, B trees, and B trees.
Here we will look only at 2-3 trees. They are the
simplest. And or purposes of learning the
underlying principles this is good. Multiway
trees can get pretty complicated pretty fast.
Keep in mind that 2-3 trees as we study them are
rarely built. Their larger cousins, B and B
trees are often used for medium and large
database applications. The general idea of a
multiway tree of order n is that each node can
hold up to n - 1 key values and each node can
have up to n childern. So, a 2-3 tree is actually
a multiway tree of order 3. That means that a 2-3
tree has nodes that can hold 1 or 2 values and
that a node can have 0, 1, 2, or 3 children. Thus
the name 2-3 tree
11
The B-Tree Shape (c.)
  • level1 root node, level2 directory nodes,
    level3 leaf nodes

12
(No Transcript)
13
B-Trees
  • For a binary search tree, the search time is
    O(h), where h is the height of the tree.
  • The height cannot be less than roughly log2n for
    a tree with n nodes.
  • If the tree is too large for internal memory,
    each access of a node is an I/O.
  • For a tree with 10,000 nodes
  • log210000 100 disk accesses
  • We need a structure that would require only
    3 or 4 disk accesses.

14
B-Trees - Definition
  • A B-tree of order M is an M-ary tree with
    the following properties
  • (1) The data items are stored at leaves.
  • (2) The nonleaf nodes store up to M - 1 keys
    to guide the searching key i represents the
    smallest key in subtree i 1.

15
  • (3) The root is either a leaf or has between 2
    and M children.
  • (4) All nonleaf nodes (except the root) have
    between ceiling(M/2) and M children.
  • (5) All leaves are at the same depth and have
    between ceiling(L/2) and L data items, for some
    L.

16
(No Transcript)
17
B-tree of order n
  • Every B-tree is of some "order n", meaning nodes
    contain from n to 2n keys (so nodes are always at
    least half full of keys), and n1 to 2n1
    pointers, and n can be any number.
  • Keys are kept in sorted order within each node. A
    corresponding list of pointers are effectively
    interspersed between keys to indicate where to
    search for a key if it isn't in the current node.

18
  • A B-tree of order n is a multi-way search tree
    with two properties
  • 1.All leaves are at the same level
  • 2.The number of keys in any node lies between
    n and 2n, with the possible exception of the root
    which may have fewer keys.

19
Other definition
  • A B-tree of order m is a m-way tree that
    satisfies the following conditions.
  • Every node has lt m children.
  • Every internal node (except the root) has ltm/2
    children.
  • The root has gt2 children.
  • An internal node with k children contains (k-1)
    ordered keys. The leftmost child contains keys
    less than or equal to the first key in the node.
    The second child contains keys greater than the
    first keys but less than or equal to the second
    key, and so on.

20
A B-tree of order 2
21
Dynamic changes in the B-Tree
  • A B-tree is an efficient self-modifying structure
    when new entries are inserted pointing to new
    rows inserted in the indexed table.
  • The nodes at every level are generally assumed
    not to be full.
  • Space is left so that inserts are often possible
    to a node at any level without new disk space
    being required.
  • An insert of a new entry always occurs at the
    leaf level, but occasionally the leaf node is too
    full to simply accept the new entry. In this
    case, for additional space the leaf level node is
    split into two leaf pages.

22
Properties of the B-Tree
  • Assumptions
  • Entry key values can have variable length because
    of variable-length column values appearing in the
    index key.
  • When a node split occurs, equal lengths of entry
    information are placed in the left and right
    split node.
  • Rebalancing actions in the B-tree occur when
    entries are deleted.

23
Properties of the B-Tree (c.)
  • Properties
  • Every node is disk-page sized and resides in a
    well-defined location.
  • Nodes above the leaf level contain directory
    entries, with n-1 separator keys and n disk
    pointers to lower-level B-tree nodes.
  • Nodes at the leaf level contain entries with
    (keyval, rowid) pairs pointing to individual rows
    indexed.
  • All nodes below the root are at least half full
    with entry information.
  • The root node contains at least two entries.

24
Insertion in B-Tree
  • 1. 2.
  • a, g, f,b k

f
a b f g
a b
g k
25
Insertion (cont.)
  • 3. 4.
  • d, h, m j
  • 5. 6.
  • e, s, i, r
    x

f j
f
a b d
g h
k m
a b d
g h k m
f j r
f j
g h i
s x
k m
a b d e
a b d e
k m r s
g h i
26
Insertion (cont.)
c f j r
  • 7.
  • c, l, n, t, u
  • 8.
  • p

s t u x
k l m n
g h i
a b
d e
j
m r
c f
a b
d e
k l
n p
g h i
s t u x
27
Inserting into a B-Tree
  • To insert key value x into a B-Tree
  • Use the B-Tree search to determine on which node
    to make the insertion.
  • Insert x into appropriate position on that leaf
    node.
  • If resulting number of keys on that node lt L,
    then simply output that node to disk and return.
  • Otherwise, split the node.

28
Inserting into a B-Tree Splitting a Node
  • Allocate a new leaf node. Put about half (i.e.,
  • about L/2) of the keys on the new node and
  • leave about half of the keys on the existing
    node.
  • Make appropriate changes to keys and pointers in
    the parent node.
  • If the parent node was already full, then split
    the parent node.
  • The splitting of parents may continue all the way
    back up to the root node.

29
Insert 19,12, 22,15.
30
Insertion
  • Insert the keys in the folowing order into a
    B-tree of order 5.
  • A, G, F, B, K, D, H, M, J, E, S, I, R, X, C,
    L, N, T, U, P.

31
(No Transcript)
32
Searching
  • Searching for an Item in a B-Tree
  • 1. Make a local variable, i, equal to the
    first index such that datai gt target. If
    there is no such index, then set i equal to
    data_count, indicating that none of the entries
    is grater than or equal to the target.
  • 2. if (we found the target at datai)
  • return true
  • else if (the root has no children)
  • return false
  • else
  • return subseti-gtcontains
    (target)

33
Searching (cont.)
  • Example target 10

6 17
19 22
12
4
20
25
2 3
16
10
18
5
34
Deletion form a B-Tree
  • 1. detete h, r

  • s promote s and

  • delete form leaf

j
c f
m r
s t u x
d e
a b
k l
n p
g h i
g i
t u x
35
Deletion (cont.)
  • 2. delete p

  • t pull s
    down

  • pull t up

j
c f
m s
d e
a b
g i
n p
k l
t u x
n s
36
Deletion (cont.)
  • 3. delete d
  • Combine

j
m t
c f
u x
k l
n s
g i
d e
a b
37
Deletion (cont.)
j
  • combine

m t
f
k l
n s
u x
g i
a b c e
f j m t
u x
n s
k l
g i
a b c e
38
Deleting from a B-Tree
  • To delete a key value x from a B-tree, first
    search to determine the leaf node that contains
    x.
  • If removing x leaves that leaf node with
    fewer than the minimum number of keys, try to
    adopt a key from a neighboring node. If thats
    possible, then youre finished.

39
Deleting from a B-Tree (continued)
  • If the neighboring node is already at its
    minimum, combine the leaf node with its
    neighboring node, resulting in one full leaf
    node.
  • This will require restructuring the parent node
    since it has lost a child
  • If the parent now has fewer than the minimum
    keys, adopt a key from one of its neighbors. If
    thats not possible, combine the parent with its
    neighbor.

40
Deleting from a B-Tree (continued)
  • This process may percolate all the way to the
    root.
  • If the root is left with only one child, then
    remove the root node and make its child the new
    root.
  • Both insertion and deletion are O(h), where h is
    the height of the tree.

41
Delete 18
42
Delete 5
43
Delete 19
44
Delete 12
45
Properties of B-Trees
  • (1) In a B-tree of order m with n keys the number
    of nodes, p, satisfies
  • and on average p 1.44 n/m.
  • (2) On average less than
  • nodes are split per insertion

46
Advantages of B-tree
  • Searching a balanced tree means that all leaves
    are at the same depth. There is no runaway
    pointer overhead. Indeed, even very large B-trees
    can guarantee only a small number of nodes must
    be retrieved to find a given key. For example, a
    B-tree of 10,000,000 keys with 50 keys per node
    never needs to retrieve more than 4 nodes to find
    any key.

47
ORACLE Create Index Statement
  • Syntax
  • create unique index indexname on tablename
  • (columnname asc desc , columnname
    asc desc)
  • tablespace tblspacename
  • storage ( initial n next n
    minextents n
  • maxextents n pctincrease n )
  • pctfree n
  • other disk storage and transaction
    clauses not covered
  • or deferred
  • nosort

48
ORACLE Create Index Statement (c.)
  • Explanation
  • The value of n in pctfree can range from 0 to 99
    and this number determines the percentage of each
    B-tree node page.
  • The list of columnnames in parentheses on the
    second line specifies a concatenation of column
    values that make up an index key on the table
    specified.
  • The nosort indicates that the rows already lie on
    disk in sorted order by the key values for this
    index.

49
DB2 Create Index Statement
  • Syntax
  • create unique index indexname on tablename
  • (columname asc desc , columnname asc
    desc)
  • using . . .
  • freepage n
  • pctfree n
  • additional clauses not covered or
    deferred

50
DB2 Create Index Statement (c.)
  • Explanation
  • The using specifies how the index is to be
    constructed from disk files.
  • The integer n of the freepage specifies how
    frequently an empty free page should be left in
    the sequence of pages assigned to the index when
    it is loaded with entries by a DB2 utility.
  • One free page is left for every n index pages
    where n varies from 0 to 255. The defaut value
    for n is 0 meaning that no free pages are left. A
    value of n1 means that alternate disk page are
    left empty.

51
INGRES Create Index Statement
  • Syntax
  • create unique index indexname on table
  • (columname , columnname)
  • with / comas separate
    clauses following /
  • location . . .
  • structure btree isam hash . . .
  • key (columnname , columnname)
  • fillfactor n
  • nonleaffill n
  • additional clauses not covered or
    deferred

52
INGRES Create Index Statement (c.)
  • Explanation
  • The with keyword must be present if any of the
    later coma-separated clauses appear.
  • The location specifies how the index is to be
    constructed from disk files.
  • The structure is unique to INGRES and names the
    access structure that the index will be assigned
    when it is created.
  • The key indicates that the index key value will
    be constructed from the columnnames listed.
  • The fillfactor and nonleaffill gives the
    percentage of node space that should be filled.

53
Index Node Layout and Free Space
  • Below is the schematic layout of a normal
    leaf-level index node with unique key value.
  • Header info
    Free space

keyval rid
keyval rid
54
More about the B-Tree
  • The purpose of the B-tree index is to minimize
    the number of disk I/Os needed to locate a row
    wit a given index key value.
  • The depth of the B-tree bears a close
    relationship to the number of disk I/Os used to
    reach the leaf-level entry where the rowid is
    kept.
  • The nodes of the B-tree are loaded in a
    left-to-right fashion so that successive inserts
    normally occur to the same leaf node held
    consistently in memory buffer.
  • When the leaf node splits, the successive leaf
    node is allocated from the next disk page of the
    allocated extent.

55
More about the B-Tree (c.)
  • Node splits at every level occur in a controlled
    way and allow us to leave just the right amount
    of free space on each page.
  • It is common to estimate the fanout at each level
    to have a value of n where n is expected number
    of entries that appear in each node. Assuming
    that there are n directory entries at the root
    node and every node below that, the number of
    entries at the second level is n2, at third
    level is n3, and so on. For a tree of depth K,
    the number of leaf-level entries is nK just
    before a root split occurs in the tree to make it
    a tree of depth K1.

56
Summary
  • The B-tree is a tree-like structure that helps us
    to organize data in an efficient way.
  • The B-tree index is a technique used to minimize
    the disk I/Os needed for the purpose of locating
    a row with a given index key value.
  • Because of its advantages, the B-tree and the
    B-tree index structure are widely used in
    databases nowadays.
Write a Comment
User Comments (0)
About PowerShow.com